Re: [Mesa-dev] [PATCH] egl: Mention if swrast is being forced
Quoting Eric Engestrom (2019-10-31 14:06:40) > On Thursday, 2019-10-31 07:35:04 +0000, Chris Wilson wrote: > > The system can be disabling HW acceleration unbeknowst to the user, > > leading to a long debug session trying to work out which component is > > failing. A quick mention that it is the environment override would be > > very useful. > > --- > > src/egl/main/egldriver.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/src/egl/main/egldriver.c b/src/egl/main/egldriver.c > > index 0d8919aa0e1..132b12ab4cb 100644 > > --- a/src/egl/main/egldriver.c > > +++ b/src/egl/main/egldriver.c > > @@ -92,6 +92,8 @@ _eglMatchDriver(_EGLDisplay *disp) > > /* set options */ > > disp->Options.ForceSoftware = > >env_var_as_boolean("LIBGL_ALWAYS_SOFTWARE", false); > > + if (disp->Options.ForceSoftware) > > + _eglLog(_EGL_DEBUG, "Found 'LIBGL_ALWAYS_SOFTWARE' set, forcing > > swrast"); > > Good idea! > Reviewed-by: Eric Engestrom > > I might even suggest going one step further and make that an _EGL_WARNING, > so that users are always informed of this by default, without having to > set EGL_LOG_LEVEL. > > I think most users don't want to disable their hardware, so the annoyance > if this warning showing up for users who want it should be completely > offset by the usefulness of this information for those who don't. Is there a consensus to go with an always visible _EGL_WARNING instead? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] egl: Mention if swrast is being forced
The system can be disabling HW acceleration unbeknowst to the user, leading to a long debug session trying to work out which component is failing. A quick mention that it is the environment override would be very useful. --- src/egl/main/egldriver.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/egl/main/egldriver.c b/src/egl/main/egldriver.c index 0d8919aa0e1..132b12ab4cb 100644 --- a/src/egl/main/egldriver.c +++ b/src/egl/main/egldriver.c @@ -92,6 +92,8 @@ _eglMatchDriver(_EGLDisplay *disp) /* set options */ disp->Options.ForceSoftware = env_var_as_boolean("LIBGL_ALWAYS_SOFTWARE", false); + if (disp->Options.ForceSoftware) + _eglLog(_EGL_DEBUG, "Found 'LIBGL_ALWAYS_SOFTWARE' set, forcing swrast"); best_drv = _eglMatchAndInitialize(disp); if (!best_drv && !disp->Options.ForceSoftware) { -- 2.24.0.rc1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Switching to Gitlab Issues instead of Bugzilla?
Quoting Daniel Stone (2019-08-30 14:13:08) > Hi, > > On Thu, 29 Aug 2019 at 21:35, Chris Wilson wrote: > > Quoting Kristian Høgsberg (2019-08-29 21:20:12) > > > On Thu, Aug 29, 2019 at 12:44 PM Chris Wilson > > > wrote: > > > > Quoting Kenneth Graunke (2019-08-29 19:52:51) > > > > > - Moving bug reports between the kernel and Mesa would be harder. > > > > > We would have to open a bug in the other system. (Then again, > > > > > moving bugs between Mesa and X or Wayland would be easier...) > > > > > > > > All that I ask is that we move the kernel bugzilla along with it. Trying > > > > to keep abreast of the bugs in the whole stack is important. Fwiw, the > > > > kernel contains the > > > > https:bugs.freedesktop.org/enter_bug.cgi?product=DRI > > > > URL so would need some redirection for a few years... > > > > > > Would Rob's suggestion of creating a placeholder drm kernel repo for > > > the purpose of issue tracking work for you? > > We can definitely get placeholder DRM repos and move those issues too. > And set up redirects. > > (To be honest, rather than one-by-one redirects, my thought with > Bugzilla was just to statically scrape all the bug pages and leave > them as read-only static copies frozen in amber for time eternal. I'd > welcome help from anyone who had the time to figure out the scripting > and Apache runes required to make those be proper HTTP redirects, but > am definitely not going to be able to do that myself.) I'd be happy with a static page telling them how to file a new issue on gitlab. > > I think so. I just want a list of all bugs that may affect the code I'm > > working on, wherever they were filed. I have a search in bugs.fdo, I > > just need instructions on how to get the same from gitlab, hopefully in > > a compact format. > > It's not clear to me what you need. Can you please give more details? At the moment, I always have open a couple of searches which are basically Product: DRI, Mesa, xorg Component: Driver/intel, Drivers/DRI/i830, Drivers/DRI/i915, Drivers/DRI/i965, Drivers/Vulkan/intel, DRM/AMDgpu, DRM/Intel, IGT Status: NEW, ASSIGNED, REOPENED, NEEDINFO I would like a similar way of getting a quick glance at the issues under discussion and any new issues across the products -- basically I want a heads up in case I've broken something, however subtle. And sometimes you just need to trawl through every bug in case you missed something. > If you want cross-component search results in a single list, that's > not really something we can do today, and I don't know if it would > land any time soon. You can however subscribe to particular issue > labels, and when you see something that catches your eye add a 'todo' > for it, then the main UI shows all your outstanding todos, including > where people have mentioned you etc. One thing we did for bugzilla was set the default QA component to a mailing list, so we had a single place to subscribe to get all the spam. I presume something similar would be available to subscribe to every issue across a range of categories. > > The issue URL will also need to be stable so that we can include it in > > commits. From a glance, > > https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861, > > looks like it will be adjusted if it ever gets moved. > > Sorry, 'moved' how? If you mean moved between components, yes the > issue will move and there will be a new URL for that. However, going > to that old URL will auto-redirect you to the new one. That's fine. I was just worrying about old links becoming stale, or worse replaced, if the issue was moved (or whatever). -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Switching to Gitlab Issues instead of Bugzilla?
Quoting Kristian Høgsberg (2019-08-29 21:20:12) > On Thu, Aug 29, 2019 at 12:44 PM Chris Wilson > wrote: > > > > Quoting Kenneth Graunke (2019-08-29 19:52:51) > > > Some cons: > > > > > > - Moving bug reports between the kernel and Mesa would be harder. > > > We would have to open a bug in the other system. (Then again, > > > moving bugs between Mesa and X or Wayland would be easier...) > > > > All that I ask is that we move the kernel bugzilla along with it. Trying > > to keep abreast of the bugs in the whole stack is important. Fwiw, the > > kernel contains the https:bugs.freedesktop.org/enter_bug.cgi?product=DRI > > URL so would need some redirection for a few years... > > Would Rob's suggestion of creating a placeholder drm kernel repo for > the purpose of issue tracking work for you? I think so. I just want a list of all bugs that may affect the code I'm working on, wherever they were filed. I have a search in bugs.fdo, I just need instructions on how to get the same from gitlab, hopefully in a compact format. The issue URL will also need to be stable so that we can include it in commits. From a glance, https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861, looks like it will be adjusted if it ever gets moved. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Switching to Gitlab Issues instead of Bugzilla?
Quoting Kenneth Graunke (2019-08-29 19:52:51) > Some cons: > > - Moving bug reports between the kernel and Mesa would be harder. > We would have to open a bug in the other system. (Then again, > moving bugs between Mesa and X or Wayland would be easier...) All that I ask is that we move the kernel bugzilla along with it. Trying to keep abreast of the bugs in the whole stack is important. Fwiw, the kernel contains the https:bugs.freedesktop.org/enter_bug.cgi?product=DRI URL so would need some redirection for a few years... -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] iris: Adapt to variable ppGTT size
Not all hardware is made equal and some does not have the full complement of 48b of address space. Ask what the actual size of virtual address space allocated for contexts, and bail if that is not enough to satisfy our static partitioning needs. Cc: Kenneth Graunke --- src/gallium/drivers/iris/iris_bufmgr.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/iris/iris_bufmgr.c b/src/gallium/drivers/iris/iris_bufmgr.c index 56261d4a947..a1e1b01c0e0 100644 --- a/src/gallium/drivers/iris/iris_bufmgr.c +++ b/src/gallium/drivers/iris/iris_bufmgr.c @@ -1617,6 +1617,21 @@ iris_reg_read(struct iris_bufmgr *bufmgr, uint32_t offset, uint64_t *result) return ret; } +static uint64_t iris_gtt_size(int fd) +{ + /* +* We use the default (already allocated) context to determine the default +* configuration of the virtual address space. +*/ + struct drm_i915_gem_context_param p = { + .param = I915_CONTEXT_PARAM_GTT_SIZE, + }; + if (!drm_ioctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p)) + return p.value; + + return 0; +} + /** * Initializes the GEM buffer manager, which uses the kernel to allocate, map, * and manage map buffer objections. @@ -1626,6 +1641,10 @@ iris_reg_read(struct iris_bufmgr *bufmgr, uint32_t offset, uint64_t *result) struct iris_bufmgr * iris_bufmgr_init(struct gen_device_info *devinfo, int fd) { + uint64_t gtt_size = iris_gtt_size(fd); + if (gtt_size <= IRIS_MEMZONE_OTHER_START) + return NULL; + struct iris_bufmgr *bufmgr = calloc(1, sizeof(*bufmgr)); if (bufmgr == NULL) return NULL; @@ -1661,7 +1680,7 @@ iris_bufmgr_init(struct gen_device_info *devinfo, int fd) _4GB - IRIS_BORDER_COLOR_POOL_SIZE); util_vma_heap_init(&bufmgr->vma_allocator[IRIS_MEMZONE_OTHER], IRIS_MEMZONE_OTHER_START, - (1ull << 48) - IRIS_MEMZONE_OTHER_START); + gtt_size - IRIS_MEMZONE_OTHER_START); // XXX: driconf bufmgr->bo_reuse = env_var_as_boolean("bo_reuse", true); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Intel-gfx] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines
Quoting Jordan Justen (2019-03-31 10:57:09) > On 2019-03-25 03:58:59, Chris Wilson wrote: > > iris currently uses two distinct GEM contexts to have distinct logical > > HW contexts for the compute and render pipelines. However, using two > > distinct GEM contexts implies that they are distinct timelines, yet as > > they are a single GL context that implies they belong to a single > > timeline from the client perspective. Currently, fences are occasionally > > inserted to order the two timelines. Using 2 GEM contexts, also implies > > that we keep 2 ppGTT for identical buffer state. If we can create a > > single GEM context, with the right capabilities, we can have a single > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines. > > > > This is allowed through the new context interface that allows VM to be > > shared, timelines to be specified, and for the logical contexts to be > > constructed as the user desires. > > > > Cc: Joonas Lahtinen > > Cc: Kenneth Graunke > > --- > > src/gallium/drivers/iris/iris_batch.c | 16 ++- > > src/gallium/drivers/iris/iris_batch.h | 5 +-- > > src/gallium/drivers/iris/iris_context.c | 56 - > > 3 files changed, 60 insertions(+), 17 deletions(-) > > > > diff --git a/src/gallium/drivers/iris/iris_batch.c > > b/src/gallium/drivers/iris/iris_batch.c > > index 556422c38bc..40bcd939795 100644 > > --- a/src/gallium/drivers/iris/iris_batch.c > > +++ b/src/gallium/drivers/iris/iris_batch.c > > @@ -164,24 +164,16 @@ iris_init_batch(struct iris_batch *batch, > > struct iris_vtable *vtbl, > > struct pipe_debug_callback *dbg, > > struct iris_batch *all_batches, > > -enum iris_batch_name name, > > -uint8_t engine, > > -int priority) > > +uint32_t hw_id, > > +enum iris_batch_name name) > > { > > batch->screen = screen; > > batch->vtbl = vtbl; > > batch->dbg = dbg; > > batch->name = name; > > > > - /* engine should be one of I915_EXEC_RENDER, I915_EXEC_BLT, etc. */ > > - assert((engine & ~I915_EXEC_RING_MASK) == 0); > > - assert(util_bitcount(engine) == 1); > > - batch->engine = engine; > > - > > - batch->hw_ctx_id = iris_create_hw_context(screen->bufmgr); > > - assert(batch->hw_ctx_id); > > - > > - iris_hw_context_set_priority(screen->bufmgr, batch->hw_ctx_id, > > priority); > > + batch->hw_ctx_id = hw_id; > > + batch->engine = name; > > I think this should fall-back to I915_EXEC_RENDER if the old style > contexts are created. It uses the legacy uABI to map both name=COMPUTE to the render ring and name=RENDER to the render ring. Should we have more or either, or start using the many video engines, ctx->engines[] will be the only way to define the execbuf mapping. So name == engine, as used today, can remain invariant across the legacy I915_EXEC_RING API and future ctx->engines[] for the remaining forseeable future of GEM_EXECBUFFER2. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] drm-uapi: Pull i915_drm.h changes for context cloning
Quoting Jordan Justen (2019-03-31 10:53:06) > Where are these changes from (repo/commit)? It could be good to > reference in the commit message. They don't exist in drm-next yet, so they don't have a reference. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines
Quoting Kenneth Graunke (2019-03-26 17:01:57) > On Tuesday, March 26, 2019 12:16:20 AM PDT Chris Wilson wrote: > > Quoting Kenneth Graunke (2019-03-26 05:52:10) > > > On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote: > > > > iris currently uses two distinct GEM contexts to have distinct logical > > > > HW contexts for the compute and render pipelines. However, using two > > > > distinct GEM contexts implies that they are distinct timelines, yet as > > > > they are a single GL context that implies they belong to a single > > > > timeline from the client perspective. Currently, fences are occasionally > > > > inserted to order the two timelines. Using 2 GEM contexts, also implies > > > > that we keep 2 ppGTT for identical buffer state. If we can create a > > > > single GEM context, with the right capabilities, we can have a single > > > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines. > > > > > > > > This is allowed through the new context interface that allows VM to be > > > > shared, timelines to be specified, and for the logical contexts to be > > > > constructed as the user desires. > > > > > > > > Cc: Joonas Lahtinen > > > > Cc: Kenneth Graunke > > > > --- > > > > src/gallium/drivers/iris/iris_batch.c | 16 ++- > > > > src/gallium/drivers/iris/iris_batch.h | 5 +-- > > > > src/gallium/drivers/iris/iris_context.c | 56 - > > > > 3 files changed, 60 insertions(+), 17 deletions(-) > > > > > > Hi Chris, > > > > > > I don't think that I want the single timeline option. It seems like > > > we've been moving away from implicit sync for a long time, and the > > > explicit sync code we have is pretty straightforward and seems to do > > > the trick. Jason and I also chatted briefly, and we don't necessarily > > > want to a strict submission-order between render/compute. > > > > I disagree if you think this means more implicit sync. It is setting up > > the GEM context to an exact match of the GL context, by _explicit_ > > control of the timeline. Then the fences you do export from inside the > > GL context do not need to be faked to be a composite of the pair of > > contexts. You still have explicit fences, and you have explicit control > > over the definition of their timeline. > > With regard to multiple GL contexts, yes, everything remains explicit. > But having 2-3 separate timelines within a GL context allows us to > reorder work behind GL's back, which is all the rage these days for > performance. Tilers do it all the time. Position-only bucketing may > require it. I'd really like to start treating render and compute as > distinct asynchronous queues. At the very least, experimenting with > that and not tying my hands to a particular behavior. That's a reasonable argument. If you want to try and keep the GL semantics intact while playing with ordering underneath, have fun! The only problem I forsee if there is any observable through which the pipelines can determine their ordering / concurrency (sampling a common buffer or clock) that might construe a violation. > There may be some use for single timeline, though. Attaching images as > compute shader inputs may require CCS/HiZ resolves, which have to happen > on the RCS. Right now, I do those on IRIS_BATCH_RENDER, which mean that > it backs up behind any queued render work. Ideally, I'd do those on a > third context, which could be tied to the compute timeline, so the > resolves and the compute job can both execute ahead of queued rendering, > but still back to back. I have an inkling that timelines should be first class for userspace to control exactly. But I have not seen anything close to a use case to justify that (yet). And by the time a usecase should arise, we will probably be onto the next shiny. That's the problem with cloudy crystal balls. > > > Separating the VMA from the context state image seems like absolutely > > > the right thing to do - as you said, they're separate in hardware, > > > and no real reason to tie it together. I would be in favor of new > > > uABI for that. > > > > > > I don't think there will be much overhead reduction from sharing the > > > VMA here though. It's very plausible that the compositor might want > > > to run between render and compute batches, at which point we end up > > > doing page directory loads anyway. I have also heard rumors about bit > > > 47
Re: [Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines
Quoting Kenneth Graunke (2019-03-26 05:52:10) > On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote: > > iris currently uses two distinct GEM contexts to have distinct logical > > HW contexts for the compute and render pipelines. However, using two > > distinct GEM contexts implies that they are distinct timelines, yet as > > they are a single GL context that implies they belong to a single > > timeline from the client perspective. Currently, fences are occasionally > > inserted to order the two timelines. Using 2 GEM contexts, also implies > > that we keep 2 ppGTT for identical buffer state. If we can create a > > single GEM context, with the right capabilities, we can have a single > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines. > > > > This is allowed through the new context interface that allows VM to be > > shared, timelines to be specified, and for the logical contexts to be > > constructed as the user desires. > > > > Cc: Joonas Lahtinen > > Cc: Kenneth Graunke > > --- > > src/gallium/drivers/iris/iris_batch.c | 16 ++- > > src/gallium/drivers/iris/iris_batch.h | 5 +-- > > src/gallium/drivers/iris/iris_context.c | 56 - > > 3 files changed, 60 insertions(+), 17 deletions(-) > > Hi Chris, > > I don't think that I want the single timeline option. It seems like > we've been moving away from implicit sync for a long time, and the > explicit sync code we have is pretty straightforward and seems to do > the trick. Jason and I also chatted briefly, and we don't necessarily > want to a strict submission-order between render/compute. I disagree if you think this means more implicit sync. It is setting up the GEM context to an exact match of the GL context, by _explicit_ control of the timeline. Then the fences you do export from inside the GL context do not need to be faked to be a composite of the pair of contexts. You still have explicit fences, and you have explicit control over the definition of their timeline. > Separating the VMA from the context state image seems like absolutely > the right thing to do - as you said, they're separate in hardware, > and no real reason to tie it together. I would be in favor of new > uABI for that. > > I don't think there will be much overhead reduction from sharing the > VMA here though. It's very plausible that the compositor might want > to run between render and compute batches, at which point we end up > doing page directory loads anyway. I have also heard rumors about bit > 47 becoming magical at some point which may prohibit us from sharing... Yeah, but that doesn't actually affect the context setup, just how you decide to use it in end. And by that point, you'll be forced into using this new uABI anyway or something entirely different :-p > Context cloning seems OK, but I'm always pretty hesitant to add new > uABI unless it's strictly necessary. In this case, we can do the same > thing with a little bit of userspace code, so I'm not sure it's worth > adding that... Actually you cannot do the same without some of the new uABI either, since previously you did not have all the parameters exposed. > I would love to see an iris patch to use the new > I915_CONTEXT_PARAM_RECOVERABLE option without the other dependencies. https://gitlab.freedesktop.org/ickle/mesa/commit/84d9cb1d8d98a50dcceea19ccbc3836b15cf73ae -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] iris: Handle GPU recovery
We want to opt out of the automatic GPU recovery and replay performed by the kernel of a guilty context after a GPU reset as our incremental batch construction very often implies that subsequent batches are a GPU reset are incomplete and will trigger fresh GPU hangs. As we are aware of how we need to reset the context state, but the kernel isn't, tell the kernel to cancel the inflight rendering and immediately report the GPU hang, where upon we reconstruct a fresh context for the next batch. --- src/gallium/drivers/iris/iris_batch.c | 92 ++--- src/gallium/drivers/iris/iris_batch.h | 13 src/gallium/drivers/iris/iris_bufmgr.c | 25 +++ src/gallium/drivers/iris/iris_context.c | 24 +-- src/gallium/drivers/iris/iris_context.h | 12 ++-- 5 files changed, 125 insertions(+), 41 deletions(-) diff --git a/src/gallium/drivers/iris/iris_batch.c b/src/gallium/drivers/iris/iris_batch.c index 40bcd939795..2edca3d558f 100644 --- a/src/gallium/drivers/iris/iris_batch.c +++ b/src/gallium/drivers/iris/iris_batch.c @@ -163,6 +163,7 @@ iris_init_batch(struct iris_batch *batch, struct iris_screen *screen, struct iris_vtable *vtbl, struct pipe_debug_callback *dbg, +iris_init_context_fn init_context, struct iris_batch *all_batches, uint32_t hw_id, enum iris_batch_name name) @@ -171,6 +172,7 @@ iris_init_batch(struct iris_batch *batch, batch->vtbl = vtbl; batch->dbg = dbg; batch->name = name; + batch->init_context = init_context; batch->hw_ctx_id = hw_id; batch->engine = name; @@ -212,6 +214,8 @@ iris_init_batch(struct iris_batch *batch, } iris_batch_reset(batch); + + batch->init_context(batch->screen, batch, batch->vtbl, batch->dbg); } static struct drm_i915_gem_exec_object2 * @@ -443,6 +447,44 @@ iris_finish_batch(struct iris_batch *batch) batch->primary_batch_size = iris_batch_bytes_used(batch); } +static int +iris_recreate_context(struct iris_batch *batch) +{ + struct drm_i915_gem_context_create_ext_clone clone = { + .base = { .name = I915_CONTEXT_CREATE_EXT_CLONE }, + .clone_id = batch->hw_ctx_id, + .flags = ~I915_CONTEXT_CLONE_UNKNOWN, + }; + struct drm_i915_gem_context_create_ext arg = { + .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS, + .extensions = (uintptr_t)&clone, + }; + if (drm_ioctl(batch->screen->fd, + DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT, + &arg)) + return -EIO; + + uint32_t old_ctx_id = batch->hw_ctx_id; + + batch->hw_ctx_id = arg.ctx_id; + batch->init_context(batch->screen, batch, batch->vtbl, batch->dbg); + + for (int b = 0; b < ARRAY_SIZE(batch->other_batches); b++) { + struct iris_batch *other = batch->other_batches[b]; + if (other->hw_ctx_id != old_ctx_id) + continue; + + other->hw_ctx_id = arg.ctx_id; + other->init_context(other->screen, other, other->vtbl, other->dbg); + } + + drm_ioctl(batch->screen->fd, + DRM_IOCTL_I915_GEM_CONTEXT_DESTROY, + &old_ctx_id); + + return 0; +} + /** * Submit the batch to the GPU via execbuffer2. */ @@ -483,17 +525,11 @@ submit_batch(struct iris_batch *batch) (uintptr_t)util_dynarray_begin(&batch->exec_fences); } - int ret = drm_ioctl(batch->screen->fd, - DRM_IOCTL_I915_GEM_EXECBUFFER2, - &execbuf); - if (ret != 0) { + int ret = 0; + if (drm_ioctl(batch->screen->fd, + DRM_IOCTL_I915_GEM_EXECBUFFER2, + &execbuf)) ret = -errno; - DBG("execbuf FAILED: errno = %d\n", -ret); - fprintf(stderr, "execbuf FAILED: errno = %d\n", -ret); - abort(); - } else { - DBG("execbuf succeeded\n"); - } for (int i = 0; i < batch->exec_count; i++) { struct iris_bo *bo = batch->exec_bos[i]; @@ -561,6 +597,25 @@ _iris_batch_flush(struct iris_batch *batch, const char *file, int line) int ret = submit_batch(batch); + batch->exec_count = 0; + batch->aperture_space = 0; + + struct iris_syncpt *syncpt = + ((struct iris_syncpt **) util_dynarray_begin(&batch->syncpts))[0]; + iris_syncpt_reference(screen, &batch->last_syncpt, syncpt); + + util_dynarray_foreach(&batch->syncpts, struct iris_syncpt *, s) + iris_syncpt_reference(screen, s, NULL); + util_dynarray_clear(&batch->syncpts); + + util_dynarray_clear(&batch->exec_fences); + + /* Start a new batch buffer. */ + iris_batch_reset(batch); + + if (ret == -EIO) + ret = iris_recreate_context(batch); + if (ret >= 0) { //if (iris->ctx.Const.ResetStrategy == GL_LOSE_CONTEXT_ON_RESET_ARB) //iris_check_for_reset(ice); @@ -574,25 +629,10 @@ _iris_batch_flush(struct iris_batch *batch, const char *file, int line) const bool color = INTEL_DEBUG & DEBUG_COLOR; fprintf(stderr, "%siris: Failed to subm
[Mesa-dev] [PATCH 1/3] drm-uapi: Pull i915_drm.h changes for context cloning
For use in GPU recovery and pipeline construction. --- include/drm-uapi/i915_drm.h | 389 +--- 1 file changed, 317 insertions(+), 72 deletions(-) diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h index d2792ab3640..59baacd265d 100644 --- a/include/drm-uapi/i915_drm.h +++ b/include/drm-uapi/i915_drm.h @@ -62,6 +62,28 @@ extern "C" { #define I915_ERROR_UEVENT "ERROR" #define I915_RESET_UEVENT "RESET" +/* + * i915_user_extension: Base class for defining a chain of extensions + * + * Many interfaces need to grow over time. In most cases we can simply + * extend the struct and have userspace pass in more data. Another option, + * as demonstrated by Vulkan's approach to providing extensions for forward + * and backward compatibility, is to use a list of optional structs to + * provide those extra details. + * + * The key advantage to using an extension chain is that it allows us to + * redefine the interface more easily than an ever growing struct of + * increasing complexity, and for large parts of that interface to be + * entirely optional. The downside is more pointer chasing; chasing across + * the boundary with pointers encapsulated inside u64. + */ +struct i915_user_extension { + __u64 next_extension; + __u32 name; + __u32 flags; /* All undefined bits must be zero. */ + __u32 rsvd[4]; /* Reserved for future use; must be zero. */ +}; + /* * MOCS indexes used for GPU surfaces, defining the cacheability of the * surface data and the coherency for this data wrt. CPU vs. GPU accesses. @@ -99,9 +121,14 @@ enum drm_i915_gem_engine_class { I915_ENGINE_CLASS_VIDEO = 2, I915_ENGINE_CLASS_VIDEO_ENHANCE = 3, + /* should be kept compact */ + I915_ENGINE_CLASS_INVALID = -1 }; +#define I915_ENGINE_CLASS_INVALID_NONE -1 +#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0 + /** * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915 * @@ -319,6 +346,9 @@ typedef struct _drm_i915_sarea { #define DRM_I915_PERF_ADD_CONFIG 0x37 #define DRM_I915_PERF_REMOVE_CONFIG0x38 #define DRM_I915_QUERY 0x39 +#define DRM_I915_GEM_VM_CREATE 0x3a +#define DRM_I915_GEM_VM_DESTROY0x3b +/* Must be kept compact -- no holes */ #define DRM_IOCTL_I915_INITDRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t) #define DRM_IOCTL_I915_FLUSH DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH) @@ -367,6 +397,7 @@ typedef struct _drm_i915_sarea { #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey) #define DRM_IOCTL_I915_GEM_WAITDRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait) #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create) +#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext) #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy) #define DRM_IOCTL_I915_REG_READDRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read) #define DRM_IOCTL_I915_GET_RESET_STATS DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats) @@ -377,6 +408,8 @@ typedef struct _drm_i915_sarea { #define DRM_IOCTL_I915_PERF_ADD_CONFIG DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config) #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64) #define DRM_IOCTL_I915_QUERY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query) +#define DRM_IOCTL_I915_GEM_VM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control) +#define DRM_IOCTL_I915_GEM_VM_DESTROY DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control) /* Allow drivers to submit batchbuffers directly to hardware, relying * on the security mechanisms provided by hardware. @@ -476,6 +509,7 @@ typedef struct drm_i915_irq_wait { #define I915_SCHEDULER_CAP_ENABLED (1ul << 0) #define I915_SCHEDULER_CAP_PRIORITY (1ul << 1) #define I915_SCHEDULER_CAP_PREEMPTION(1ul << 2) +#define I915_SCHEDULER_CAP_SEMAPHORES(1ul << 3) #define I915_PARAM_HUC_STATUS 42 @@ -559,6 +593,14 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_MMAP_GTT_COHERENT 52 +/* + * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel + * execution through use of explicit fence support. + * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT. + */ +#define I915_PARAM_HAS_EXEC_SUBM
[Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines
iris currently uses two distinct GEM contexts to have distinct logical HW contexts for the compute and render pipelines. However, using two distinct GEM contexts implies that they are distinct timelines, yet as they are a single GL context that implies they belong to a single timeline from the client perspective. Currently, fences are occasionally inserted to order the two timelines. Using 2 GEM contexts, also implies that we keep 2 ppGTT for identical buffer state. If we can create a single GEM context, with the right capabilities, we can have a single VM, a single timeline, but 2 logical HW contexts for the 2 pipelines. This is allowed through the new context interface that allows VM to be shared, timelines to be specified, and for the logical contexts to be constructed as the user desires. Cc: Joonas Lahtinen Cc: Kenneth Graunke --- src/gallium/drivers/iris/iris_batch.c | 16 ++- src/gallium/drivers/iris/iris_batch.h | 5 +-- src/gallium/drivers/iris/iris_context.c | 56 - 3 files changed, 60 insertions(+), 17 deletions(-) diff --git a/src/gallium/drivers/iris/iris_batch.c b/src/gallium/drivers/iris/iris_batch.c index 556422c38bc..40bcd939795 100644 --- a/src/gallium/drivers/iris/iris_batch.c +++ b/src/gallium/drivers/iris/iris_batch.c @@ -164,24 +164,16 @@ iris_init_batch(struct iris_batch *batch, struct iris_vtable *vtbl, struct pipe_debug_callback *dbg, struct iris_batch *all_batches, -enum iris_batch_name name, -uint8_t engine, -int priority) +uint32_t hw_id, +enum iris_batch_name name) { batch->screen = screen; batch->vtbl = vtbl; batch->dbg = dbg; batch->name = name; - /* engine should be one of I915_EXEC_RENDER, I915_EXEC_BLT, etc. */ - assert((engine & ~I915_EXEC_RING_MASK) == 0); - assert(util_bitcount(engine) == 1); - batch->engine = engine; - - batch->hw_ctx_id = iris_create_hw_context(screen->bufmgr); - assert(batch->hw_ctx_id); - - iris_hw_context_set_priority(screen->bufmgr, batch->hw_ctx_id, priority); + batch->hw_ctx_id = hw_id; + batch->engine = name; util_dynarray_init(&batch->exec_fences, ralloc_context(NULL)); util_dynarray_init(&batch->syncpts, ralloc_context(NULL)); diff --git a/src/gallium/drivers/iris/iris_batch.h b/src/gallium/drivers/iris/iris_batch.h index 2a68103c379..db1a4cbbe11 100644 --- a/src/gallium/drivers/iris/iris_batch.h +++ b/src/gallium/drivers/iris/iris_batch.h @@ -131,9 +131,8 @@ void iris_init_batch(struct iris_batch *batch, struct iris_vtable *vtbl, struct pipe_debug_callback *dbg, struct iris_batch *all_batches, - enum iris_batch_name name, - uint8_t ring, - int priority); + uint32_t hw_id, + enum iris_batch_name name); void iris_chain_to_new_batch(struct iris_batch *batch); void iris_batch_free(struct iris_batch *batch); void iris_batch_maybe_flush(struct iris_batch *batch, unsigned estimate); diff --git a/src/gallium/drivers/iris/iris_context.c b/src/gallium/drivers/iris/iris_context.c index a1d11755a24..aeb608c70bd 100644 --- a/src/gallium/drivers/iris/iris_context.c +++ b/src/gallium/drivers/iris/iris_context.c @@ -143,6 +143,57 @@ iris_destroy_context(struct pipe_context *ctx) unreachable("Unknown hardware generation"); \ } +static void create_hw_contexts(struct iris_screen *screen, + uint32_t *hw_id, + int priority) +{ +#define RCS0 {0, 0} + I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 2) = { + .class_instance = { RCS0, RCS0 } + }; + struct drm_i915_gem_context_create_ext_setparam p_engines = { + .base = { + .name = I915_CONTEXT_CREATE_EXT_SETPARAM, +.next_extension = 0, /* end of chain */ + }, + .param = { + .param = I915_CONTEXT_PARAM_ENGINES, + .value = (uintptr_t)&engines, + .size = sizeof(engines), + }, + }; + struct drm_i915_gem_context_create_ext_setparam p_prio = { + .base = { + .name =I915_CONTEXT_CREATE_EXT_SETPARAM, + .next_extension = (uintptr_t)&p_engines, + }, + .param = { + .param = I915_CONTEXT_PARAM_PRIORITY, + .value = priority, + }, + }; + struct drm_i915_gem_context_create_ext arg = { + .flags = I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE | + I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS, + .extensions = (uintptr_t)&p_prio, + }; + + if (drm_ioctl(screen->fd, + DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT, + &arg) == 0) { + for (int i = 0; i < IRIS_BATCH_COUNT; i++) + hw_id[i] = arg.ctx_id; + return; + } + + /* No fancy new features; create two contexts instead */ + for (int i = 0; i < IR
Re: [Mesa-dev] [PATCH] i965: Perform manual preemption checks between commands
Quoting Rafael Antognolli (2019-03-05 19:33:03) > On Tue, Mar 05, 2019 at 09:40:20AM +0000, Chris Wilson wrote: > > Not all commands support being preempted as they execute, and for those > > make sure we at least check for being preempted before we start so as to > > try and minimise the latency of whomever is more important than > > ourselves. > > > > Cc: Jari Tahvanainen , > > Cc: Rafael Antognolli > > Cc: Kenneth Graunke > > --- > > Always double check before you hit send. > > --- > > src/mesa/drivers/dri/i965/brw_defines.h | 1 + > > src/mesa/drivers/dri/i965/brw_draw.c| 7 +++ > > 2 files changed, 8 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > > b/src/mesa/drivers/dri/i965/brw_defines.h > > index 2729a54e144..ef71c556cca 100644 > > --- a/src/mesa/drivers/dri/i965/brw_defines.h > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > > @@ -1420,6 +1420,7 @@ enum brw_pixel_shader_coverage_mask_mode { > > > > #define MI_NOOP (CMD_MI | 0) > > > > +#define MI_ARB_CHECK (CMD_MI | 0x5 << 23) > > #define MI_BATCH_BUFFER_END (CMD_MI | 0xA << 23) > > > > #define MI_FLUSH (CMD_MI | (4 << 23)) > > diff --git a/src/mesa/drivers/dri/i965/brw_draw.c > > b/src/mesa/drivers/dri/i965/brw_draw.c > > index d07349419cc..a04e334ffc4 100644 > > --- a/src/mesa/drivers/dri/i965/brw_draw.c > > +++ b/src/mesa/drivers/dri/i965/brw_draw.c > > @@ -196,6 +196,13 @@ brw_emit_prim(struct brw_context *brw, > > if (verts_per_instance == 0 && !prim->is_indirect && !xfb_obj) > >return; > > > > + /* If this object is not itself preemptible, check before we begin. */ > > + if (!brw->object_preemption) { > > + BEGIN_BATCH(1); > > + OUT_BATCH(MI_ARB_CHECK); > > + ADVANCE_BATCH(); > > + } > > + > > "The command streamer will preempt in the case arbitration is enabled, > there is a pending execution list and this command is currently being > parsed." > > If there is a pending execution list, shouldn't we have been preempted > already, since mid-batch preemption is supposed to be enabled? No, it still only occurs on certain instructions, not every instruction boundary. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Perform manual preemption checks between commands
Not all commands support being preempted as they execute, and for those make sure we at least check for being preempted before we start so as to try and minimise the latency of whomever is more important than ourselves. Cc: Jari Tahvanainen , Cc: Rafael Antognolli Cc: Kenneth Graunke --- Always double check before you hit send. --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_draw.c| 7 +++ 2 files changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 2729a54e144..ef71c556cca 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1420,6 +1420,7 @@ enum brw_pixel_shader_coverage_mask_mode { #define MI_NOOP(CMD_MI | 0) +#define MI_ARB_CHECK (CMD_MI | 0x5 << 23) #define MI_BATCH_BUFFER_END(CMD_MI | 0xA << 23) #define MI_FLUSH (CMD_MI | (4 << 23)) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index d07349419cc..a04e334ffc4 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -196,6 +196,13 @@ brw_emit_prim(struct brw_context *brw, if (verts_per_instance == 0 && !prim->is_indirect && !xfb_obj) return; + /* If this object is not itself preemptible, check before we begin. */ + if (!brw->object_preemption) { + BEGIN_BATCH(1); + OUT_BATCH(MI_ARB_CHECK); + ADVANCE_BATCH(); + } + /* If we're set to always flush, do it before and after the primitive emit. * We want to catch both missed flushes that hurt instruction/state cache * and missed flushes of the render cache as it heads to other parts of -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Perform manual preemption checks between commands
Not all commands support being preempted as they execute, and for those make sure we at least check for being preempted before we start so as to try and minimise the latency of whomever is more important than ourselves. Cc: Jari Tahvanainen , Cc: Rafael Antognolli Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_draw.c| 7 +++ 2 files changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 2729a54e144..ef71c556cca 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1420,6 +1420,7 @@ enum brw_pixel_shader_coverage_mask_mode { #define MI_NOOP(CMD_MI | 0) +#define MI_ARB_CHECK (CMD_MI | 0x5 << 23) #define MI_BATCH_BUFFER_END(CMD_MI | 0xA << 23) #define MI_FLUSH (CMD_MI | (4 << 23)) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index d07349419cc..49bb531ae7b 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -196,6 +196,13 @@ brw_emit_prim(struct brw_context *brw, if (verts_per_instance == 0 && !prim->is_indirect && !xfb_obj) return; + /* If this object is not itself preemptible, check before we begin. */ + if (!brw->object_preemption) { + BEGIN_BATCH(3); + OUT_BATCH(MI_ARB_CHECK); + ADVANCE_BATCH(); + } + /* If we're set to always flush, do it before and after the primitive emit. * We want to catch both missed flushes that hurt instruction/state cache * and missed flushes of the render cache as it heads to other parts of -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] gallium: Implement APPLE_object_purgeable (iris, freedeno, vc4)
Quoting Emil Velikov (2019-02-28 11:44:28) > On Tue, 26 Feb 2019 at 21:52, Chris Wilson wrote: > > > > A few of the GEM drivers provide matching ioctls to allow control of > > their bo caches. Hook these up to APPLE_object_purgeable to allow > > clients to discard video memory under pressure where they are able to > > fallback to restoring content themselves, e.g. from their own (presumably > > compressed, on disk) caches. > > > > v2: Refactor the repeated resource purging. > > > > Cc: Eric Anholt > > Cc: Kenneth Graunke > > Cc: Rob Clark > > --- > > .../drivers/freedreno/freedreno_resource.c| 10 ++ > > .../drivers/freedreno/freedreno_screen.c | 1 + > > src/gallium/drivers/iris/iris_resource.c | 10 ++ > > src/gallium/drivers/iris/iris_screen.c| 1 + > > src/gallium/drivers/vc4/vc4_bufmgr.c | 15 ++ > > src/gallium/drivers/vc4/vc4_bufmgr.h | 3 + > > src/gallium/drivers/vc4/vc4_resource.c| 10 ++ > > src/gallium/drivers/vc4/vc4_screen.c | 3 + > > src/gallium/include/pipe/p_defines.h | 1 + > > src/gallium/include/pipe/p_screen.h | 20 +++ > > src/mesa/Makefile.sources | 2 + > > src/mesa/meson.build | 2 + > > src/mesa/state_tracker/st_cb_objectpurge.c| 141 ++ > > src/mesa/state_tracker/st_cb_objectpurge.h| 38 + > > src/mesa/state_tracker/st_context.c | 2 + > > src/mesa/state_tracker/st_extensions.c| 1 + > > 16 files changed, 260 insertions(+) > > create mode 100644 src/mesa/state_tracker/st_cb_objectpurge.c > > create mode 100644 src/mesa/state_tracker/st_cb_objectpurge.h > > > As-is this is, kind of, blocked on getting it right on all drivers. > Can I'd suggest splitting this in 4 patches: > - st/mesa - src/mesa + src/gallium/include + src/gallium/aux > - iris > - freedreno > - vc4 > > This way others can wire their drivers while the iris/freedreno/vc4 > review/testing is ongoing. Sure, even more so when the changes are non trivial as they will have to be pass the functional tests [to be written]. Hopefully, I can break iris locally. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] gallium: Implement APPLE_object_purgeable (iris, freedeno, vc4)
Quoting Eric Anholt (2019-02-27 02:19:32) > Overall, I'm hesitatant to land support for actually doing anything with > APPLE_object_purgeable when there are no functional tests of it. I > don't mean to actually have tests that force purging, but at least > making sure that we don't accidentally break rendering by having marked > things purgeable at all. For test design you are thinking of a few of the more common operations with a purgeable/unpurgeable cycle in between? For each of the purgeable objects (i.e. every object type). And from a robustness pov, run through with GL_UNDEFINED_APPLE in between, so the objects are discarded before attempting to reuse (and gracefully failing). -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] gallium: Implement APPLE_object_purgeable (iris, freedeno, vc4)
A few of the GEM drivers provide matching ioctls to allow control of their bo caches. Hook these up to APPLE_object_purgeable to allow clients to discard video memory under pressure where they are able to fallback to restoring content themselves, e.g. from their own (presumably compressed, on disk) caches. v2: Refactor the repeated resource purging. Cc: Eric Anholt Cc: Kenneth Graunke Cc: Rob Clark --- .../drivers/freedreno/freedreno_resource.c| 10 ++ .../drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/iris/iris_resource.c | 10 ++ src/gallium/drivers/iris/iris_screen.c| 1 + src/gallium/drivers/vc4/vc4_bufmgr.c | 15 ++ src/gallium/drivers/vc4/vc4_bufmgr.h | 3 + src/gallium/drivers/vc4/vc4_resource.c| 10 ++ src/gallium/drivers/vc4/vc4_screen.c | 3 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_screen.h | 20 +++ src/mesa/Makefile.sources | 2 + src/mesa/meson.build | 2 + src/mesa/state_tracker/st_cb_objectpurge.c| 141 ++ src/mesa/state_tracker/st_cb_objectpurge.h| 38 + src/mesa/state_tracker/st_context.c | 2 + src/mesa/state_tracker/st_extensions.c| 1 + 16 files changed, 260 insertions(+) create mode 100644 src/mesa/state_tracker/st_cb_objectpurge.c create mode 100644 src/mesa/state_tracker/st_cb_objectpurge.h diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c b/src/gallium/drivers/freedreno/freedreno_resource.c index 163fa70312a..5638d8efe33 100644 --- a/src/gallium/drivers/freedreno/freedreno_resource.c +++ b/src/gallium/drivers/freedreno/freedreno_resource.c @@ -702,6 +702,15 @@ fd_resource_get_handle(struct pipe_screen *pscreen, rsc->slices[0].pitch * rsc->cpp, handle); } +static boolean +fd_resource_madvise(struct pipe_screen *pscreen, +struct pipe_resource *resource, +boolean dontneed) +{ + struct fd_resource *rsc = fd_resource(prsc); + return rsc->bo->madvise(rsc->bo, !dontneed); +} + static uint32_t setup_slices(struct fd_resource *rsc, uint32_t alignment, enum pipe_format format) { @@ -1196,6 +1205,7 @@ fd_resource_screen_init(struct pipe_screen *pscreen) pscreen->resource_create_with_modifiers = fd_resource_create_with_modifiers; pscreen->resource_from_handle = fd_resource_from_handle; pscreen->resource_get_handle = fd_resource_get_handle; + pscreen->resource_madvise = fd_resource_madvise; pscreen->resource_destroy = u_transfer_helper_resource_destroy; pscreen->transfer_helper = u_transfer_helper_create(&transfer_vtbl, diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 5b107b87ba8..909cfecd6f9 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -194,6 +194,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MIXED_COLOR_DEPTH_BITS: case PIPE_CAP_TEXTURE_BARRIER: case PIPE_CAP_INVALIDATE_BUFFER: + case PIPE_CAP_RESOURCE_MADVISE: return 1; case PIPE_CAP_VERTEXID_NOBASE: diff --git a/src/gallium/drivers/iris/iris_resource.c b/src/gallium/drivers/iris/iris_resource.c index b5829e94b36..d681dca7321 100644 --- a/src/gallium/drivers/iris/iris_resource.c +++ b/src/gallium/drivers/iris/iris_resource.c @@ -1199,6 +1199,15 @@ iris_flush_resource(struct pipe_context *ctx, struct pipe_resource *resource) { } +static boolean +iris_resource_madvise(struct pipe_screen *pscreen, + struct pipe_resource *resource, + boolean dontneed) +{ + struct iris_resource *res = (struct iris_resource *)resource; + return iris_bo_madvise(res->bo, dontneed); +} + void iris_flush_and_dirty_for_history(struct iris_context *ice, struct iris_batch *batch, @@ -1269,6 +1278,7 @@ iris_init_screen_resource_functions(struct pipe_screen *pscreen) pscreen->resource_create = u_transfer_helper_resource_create; pscreen->resource_from_user_memory = iris_resource_from_user_memory; pscreen->resource_from_handle = iris_resource_from_handle; + pscreen->resource_madvise = iris_resource_madvise; pscreen->resource_get_handle = iris_resource_get_handle; pscreen->resource_destroy = u_transfer_helper_resource_destroy; pscreen->transfer_helper = diff --git a/src/gallium/drivers/iris/iris_screen.c b/src/gallium/drivers/iris/iris_screen.c index d831ffbc0a2..ecccf6af5e2 100644 --- a/src/gallium/drivers/iris/iris_screen.c +++ b/src/gallium/drivers/iris/iris_screen.c @@ -172,6 +172,7 @@ iris_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_GLSL_TESS_LEVELS_AS_INPUTS: case PIPE_CAP_LOAD_CONSTBUF: case
[Mesa-dev] [PATCH] gallium: Implement APPLE_object_purgeable (iris, freedeno, vc4)
A few of the GEM drivers provide matching ioctls to allow control of their bo caches. Hook these up to APPLE_object_purgeable to allow clients to discard video memory under pressure where they are able to fallback to restoring content themselves, e.g. from their own (presumably compressed, on disk) caches. Cc: Eric Anholt Cc: Kenneth Graunke Cc: Rob Clark --- .../drivers/freedreno/freedreno_resource.c| 10 ++ .../drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/iris/iris_resource.c | 10 ++ src/gallium/drivers/iris/iris_screen.c| 1 + src/gallium/drivers/vc4/vc4_bufmgr.c | 15 ++ src/gallium/drivers/vc4/vc4_bufmgr.h | 3 + src/gallium/drivers/vc4/vc4_resource.c| 10 ++ src/gallium/drivers/vc4/vc4_screen.c | 3 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_screen.h | 20 +++ src/mesa/Makefile.sources | 2 + src/mesa/meson.build | 2 + src/mesa/state_tracker/st_cb_objectpurge.c| 161 ++ src/mesa/state_tracker/st_cb_objectpurge.h| 38 + src/mesa/state_tracker/st_context.c | 2 + src/mesa/state_tracker/st_extensions.c| 1 + 16 files changed, 280 insertions(+) create mode 100644 src/mesa/state_tracker/st_cb_objectpurge.c create mode 100644 src/mesa/state_tracker/st_cb_objectpurge.h diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c b/src/gallium/drivers/freedreno/freedreno_resource.c index 163fa70312a..5638d8efe33 100644 --- a/src/gallium/drivers/freedreno/freedreno_resource.c +++ b/src/gallium/drivers/freedreno/freedreno_resource.c @@ -702,6 +702,15 @@ fd_resource_get_handle(struct pipe_screen *pscreen, rsc->slices[0].pitch * rsc->cpp, handle); } +static boolean +fd_resource_madvise(struct pipe_screen *pscreen, +struct pipe_resource *resource, +boolean dontneed) +{ + struct fd_resource *rsc = fd_resource(prsc); + return rsc->bo->madvise(rsc->bo, !dontneed); +} + static uint32_t setup_slices(struct fd_resource *rsc, uint32_t alignment, enum pipe_format format) { @@ -1196,6 +1205,7 @@ fd_resource_screen_init(struct pipe_screen *pscreen) pscreen->resource_create_with_modifiers = fd_resource_create_with_modifiers; pscreen->resource_from_handle = fd_resource_from_handle; pscreen->resource_get_handle = fd_resource_get_handle; + pscreen->resource_madvise = fd_resource_madvise; pscreen->resource_destroy = u_transfer_helper_resource_destroy; pscreen->transfer_helper = u_transfer_helper_create(&transfer_vtbl, diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 5b107b87ba8..909cfecd6f9 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -194,6 +194,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MIXED_COLOR_DEPTH_BITS: case PIPE_CAP_TEXTURE_BARRIER: case PIPE_CAP_INVALIDATE_BUFFER: + case PIPE_CAP_RESOURCE_MADVISE: return 1; case PIPE_CAP_VERTEXID_NOBASE: diff --git a/src/gallium/drivers/iris/iris_resource.c b/src/gallium/drivers/iris/iris_resource.c index b5829e94b36..d681dca7321 100644 --- a/src/gallium/drivers/iris/iris_resource.c +++ b/src/gallium/drivers/iris/iris_resource.c @@ -1199,6 +1199,15 @@ iris_flush_resource(struct pipe_context *ctx, struct pipe_resource *resource) { } +static boolean +iris_resource_madvise(struct pipe_screen *pscreen, + struct pipe_resource *resource, + boolean dontneed) +{ + struct iris_resource *res = (struct iris_resource *)resource; + return iris_bo_madvise(res->bo, dontneed); +} + void iris_flush_and_dirty_for_history(struct iris_context *ice, struct iris_batch *batch, @@ -1269,6 +1278,7 @@ iris_init_screen_resource_functions(struct pipe_screen *pscreen) pscreen->resource_create = u_transfer_helper_resource_create; pscreen->resource_from_user_memory = iris_resource_from_user_memory; pscreen->resource_from_handle = iris_resource_from_handle; + pscreen->resource_madvise = iris_resource_madvise; pscreen->resource_get_handle = iris_resource_get_handle; pscreen->resource_destroy = u_transfer_helper_resource_destroy; pscreen->transfer_helper = diff --git a/src/gallium/drivers/iris/iris_screen.c b/src/gallium/drivers/iris/iris_screen.c index d831ffbc0a2..ecccf6af5e2 100644 --- a/src/gallium/drivers/iris/iris_screen.c +++ b/src/gallium/drivers/iris/iris_screen.c @@ -172,6 +172,7 @@ iris_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_GLSL_TESS_LEVELS_AS_INPUTS: case PIPE_CAP_LOAD_CONSTBUF: case PIPE_CAP_NIR_COMPACT_ARRAYS: + case PIPE_CA
Re: [Mesa-dev] [PATCH v2 2/2] isl: the display engine requires 64B alignment for linear surfaces
Quoting Lionel Landwerlin (2019-02-21 12:57:09) > I did not find the PRM bit that says it must be 64b aligned, but I can > see that's what i915 checks. > > Chris: If you have a pointer to it, I could add the quote. In amongst the register specs, PLANE_STRIDE: For Linear memory, this field specifies the stride in chunks of 64 bytes (1 cache line). -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] isl: remove the cache line size alignment requirement
Quoting Lionel Landwerlin (2019-02-18 15:06:15) > On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote: > > There are formats which bpp are not aligned to a power-of-two and > > that can cause problems in the checks we do. > > > > The cacheline size was a requirement for using the BLT engine, which > > we don't use anymore except for a few things on old HW, so we drop it. > > > > Fixes CTS's CL#3500 test: > > > > dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm > > > > Signed-off-by: Samuel Iglesias Gonsálvez > > > That looks good to me : > Reviewed-by: Lionel Landwerlin > > I'm doing a CI run just to convince myself, so if you can wait for that. Is scanout a concern? The display engine also requires 64B alignment for linear. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965: Take responsibility for context recovery after any GPU hang
To make wedging even more likely, we use a new "no recovery" context parameter that tells the kernel to not even attempt to replay any batches in flight against the default context image, as experience shows the HW is not always robust enough to cope with the conflicting state. This allows us to always take over responsibility of rebuilding the lost context following a GPU hang. Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 1248f8b9fa4..a0cbc315b60 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1589,6 +1589,28 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) } } +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + /* +* Upon declaring a GPU hang, the kernel will zap the guilty context +* back to the default logical HW state and attempt to continue on to +* our next submitted batchbuffer. However, we only send incremental +* logical state (e.g. we only ever setup invariant register state +* once in brw_initial_gpu_upload()) and so attempting to reply the +* next batchbuffer without the correct logical state can be fatal. +* Here we tell the kernel not to attempt to recover our context but +* immediately (on the next batchbuffer submission) report that the +* context is lost, and we will do the recovery ourselves -- 2 lost +* batches instead of a continual stream until we are banned, or the +* machine is dead. +*/ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = I915_CONTEXT_PARAM_RECOVERABLE, + }; + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); +} + uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr) { @@ -1599,6 +1621,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) return 0; } + init_context(bufmgr, create.ctx_id); + return create.ctx_id; } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] drm-uapi: Update i915_drm.h for I915_CONTEXT_PARAM_RECOVERABLE
XXX Not in drm-next XXX Pull i915_drm.h to include commit ba4fda620a5f7db521aa9e0262cf49854c1b1d9c (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson Date: Mon Feb 18 10:58:21 2019 + drm/i915: Optionally disable automatic recovery after a GPU reset for improved resilience in handling GPU hangs. --- include/drm-uapi/i915_drm.h | 114 1 file changed, 114 insertions(+) diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h index 16e452aa12d..43fb8ede2fe 100644 --- a/include/drm-uapi/i915_drm.h +++ b/include/drm-uapi/i915_drm.h @@ -412,6 +412,14 @@ typedef struct drm_i915_irq_wait { int irq_seq; } drm_i915_irq_wait_t; +/* + * Different modes of per-process Graphics Translation Table, + * see I915_PARAM_HAS_ALIASING_PPGTT + */ +#define I915_GEM_PPGTT_NONE0 +#define I915_GEM_PPGTT_ALIASING1 +#define I915_GEM_PPGTT_FULL2 + /* Ioctl to query kernel params: */ #define I915_PARAM_IRQ_ACTIVE1 @@ -529,6 +537,28 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51 +/* + * Once upon a time we supposed that writes through the GGTT would be + * immediately in physical memory (once flushed out of the CPU path). However, + * on a few different processors and chipsets, this is not necessarily the case + * as the writes appear to be buffered internally. Thus a read of the backing + * storage (physical memory) via a different path (with different physical tags + * to the indirect write via the GGTT) will see stale values from before + * the GGTT write. Inside the kernel, we can for the most part keep track of + * the different read/write domains in use (e.g. set-domain), but the assumption + * of coherency is baked into the ABI, hence reporting its true state in this + * parameter. + * + * Reports true when writes via mmap_gtt are immediately visible following an + * lfence to flush the WCB. + * + * Reports false when writes via mmap_gtt are indeterminately delayed in an in + * internal buffer and are _not_ immediately visible to third parties accessing + * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC + * communications channel when reporting false is strongly disadvised. + */ +#define I915_PARAM_MMAP_GTT_COHERENT 52 + typedef struct drm_i915_getparam { __s32 param; /* @@ -1456,9 +1486,93 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */ #define I915_CONTEXT_DEFAULT_PRIORITY0 #define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */ + /* +* When using the following param, value should be a pointer to +* drm_i915_gem_context_param_sseu. +*/ +#define I915_CONTEXT_PARAM_SSEU0x7 + +/* + * Not all clients may want to attempt automatic recover of a context after + * a hang (for example, some clients may only submit very small incremental + * batches relying on known logical state of previous batches which will never + * recover correctly and each attempt will hang), and so would prefer that + * the context is forever banned instead. + * + * If set to false (0), after a reset, subsequent (and in flight) rendering + * from this context is discarded, and the client will need to create a new + * context to use instead. + * + * If set to true (1), the kernel will automatically attempt to recover the + * context by skipping the hanging batch and executing the next batch starting + * from the default context state (discarding the incomplete logical context + * state lost due to the reset). + * + * On creation, all new contexts are marked as recoverable. + */ +#define I915_CONTEXT_PARAM_RECOVERABLE 0x8 __u64 value; }; +/** + * Context SSEU programming + * + * It may be necessary for either functional or performance reason to configure + * a context to run with a reduced number of SSEU (where SSEU stands for Slice/ + * Sub-slice/EU). + * + * This is done by configuring SSEU configuration using the below + * @struct drm_i915_gem_context_param_sseu for every supported engine which + * userspace intends to use. + * + * Not all GPUs or engines support this functionality in which case an error + * code -ENODEV will be returned. + * + * Also, flexibility of possible SSEU configuration permutations varies between + * GPU generations and software imposed limitations. Requesting such a + * combination will return an error code of -EINVAL. + * + * NOTE: When perf/OA is active the context's SSEU configuration is ignored in + * favour of a single global setting. + */ +struct drm_i915_gem_context_param_sseu { + /* +* Engine class & instance to be configured or queried. +*/ + __u16 engine_class; + __u16 engine_instance; + + /* +* Unused for now. Must be cleared to zero. +*/ + __u32 flags; + + /* +* Ma
[Mesa-dev] [PATCH 1/3] i965: Be resilient in the face of GPU hangs
If we hang the GPU and end up banning our context, we will no longer be able to submit and abort with an error (exit(1) no less). As we submit minimal incremental batches that rely on the logical context state of previous batches, we can not rely on the kernel's recovery mechanism which tries to restore the context back to a "golden" renderstate (the default HW setup) and replay the batches in flight. Instead, we must create a new context and set it up, including all the lost register settings that we only apply once during setup, before allow the user to continue rendering. The batches already submitted are lost (unrecoverable) so there will be a momentarily glitch and lost rendering across frames, but the application should be able to recover and continue on fairly oblivious. To make wedging even more likely, we use a new "no recovery" context parameter that tells the kernel to not even attempt to replay any batches in flight against the default context image, as experience shows the HW is not always robust enough to cope with the conflicting state. v2: Export brw_reset_state() to improve the amount of state we clobber on return to a starting context. (Kenneth) Cc: Kenneth Graunke Reviewed-by: Kenneth Graunke # pre-uapi split --- src/mesa/drivers/dri/i965/brw_bufmgr.c| 11 + src/mesa/drivers/dri/i965/brw_bufmgr.h| 1 + src/mesa/drivers/dri/i965/brw_context.h | 3 +++ src/mesa/drivers/dri/i965/brw_state_upload.c | 22 ++ src/mesa/drivers/dri/i965/intel_batchbuffer.c | 23 +++ 5 files changed, 56 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index b33a30930db..1248f8b9fa4 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1621,6 +1621,17 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, return err; } +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = I915_CONTEXT_PARAM_PRIORITY, + }; + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); + return p.value; /* on error, return 0 i.e. default priority */ +} + void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) { diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..9e80c2a831b 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -356,6 +356,7 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id, int priority); +int brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 66fe5b3a8a0..4a306c4217a 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1647,6 +1647,9 @@ brw_get_graphics_reset_status(struct gl_context *ctx); void brw_check_for_reset(struct brw_context *brw); +void +brw_reset_state(struct brw_context *brw); + /* brw_compute.c */ extern void brw_init_compute_functions(struct dd_function_table *functions); diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 50049d325b3..b873cf1b58a 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -228,12 +228,8 @@ brw_copy_pipeline_atoms(struct brw_context *brw, void brw_init_state( struct brw_context *brw ) { - struct gl_context *ctx = &brw->ctx; const struct gen_device_info *devinfo = &brw->screen->devinfo; - /* Force the first brw_select_pipeline to emit pipeline select */ - brw->last_pipeline = BRW_NUM_PIPELINES; - brw_init_caches(brw); if (devinfo->gen >= 11) @@ -257,6 +253,17 @@ void brw_init_state( struct brw_context *brw ) else gen4_init_atoms(brw); + brw_reset_state(brw); +} + + +void brw_reset_state( struct brw_context *brw ) +{ + struct gl_context *ctx = &brw->ctx; + + /* Force the first brw_select_pipeline to emit pipeline select */ + brw->last_pipeline = BRW_NUM_PIPELINES; + brw_upload_initial_gpu_state(brw); brw->NewGLState = ~0; @@ -267,6 +274,13 @@ void brw_init_state( struct brw_context *brw ) */ brw->pma_stall_bits = ~0; + brw->no_depth_or_stencil = false; + + brw->urb.vsize = 0; + brw->urb.gsize = 0; + brw->urb.hsize = 0; + brw->urb.dsize = 0; + /* Make sure that brw->ctx.NewDriverState has enough bits to hold all possible * dirty flags. */ diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i96
[Mesa-dev] [PATCH v2 2/3] i965: Add INTEL_DEBUG=hang
Introduce a new debug option to wilfully cause the GPU to hang and for the kernel to accuse of being neglectful. --- src/intel/Makefile.sources| 2 + src/intel/common/gen_debug.c | 1 + src/intel/common/gen_debug.h | 1 + src/intel/common/gen_hang.c | 176 ++ src/intel/common/gen_hang.h | 51 + src/intel/common/meson.build | 2 + src/mesa/drivers/dri/i965/intel_batchbuffer.c | 14 ++ 7 files changed, 247 insertions(+) create mode 100644 src/intel/common/gen_hang.c create mode 100644 src/intel/common/gen_hang.h diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources index 94a28d370e8..9058633abfc 100644 --- a/src/intel/Makefile.sources +++ b/src/intel/Makefile.sources @@ -18,6 +18,8 @@ COMMON_FILES = \ common/gen_disasm.h \ common/gen_defines.h \ common/gen_gem.h \ + common/gen_hang.c \ + common/gen_hang.h \ common/gen_l3_config.c \ common/gen_l3_config.h \ common/gen_urb_config.c \ diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c index a978f2f5818..a4dd3965e13 100644 --- a/src/intel/common/gen_debug.c +++ b/src/intel/common/gen_debug.c @@ -85,6 +85,7 @@ static const struct debug_control debug_control[] = { { "nohiz", DEBUG_NO_HIZ }, { "color", DEBUG_COLOR }, { "reemit", DEBUG_REEMIT }, + { "hang",DEBUG_HANG }, { NULL,0 } }; diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h index 72d7ca20a39..49a93b87ebc 100644 --- a/src/intel/common/gen_debug.h +++ b/src/intel/common/gen_debug.h @@ -83,6 +83,7 @@ extern uint64_t INTEL_DEBUG; #define DEBUG_NO_HIZ (1ull << 39) #define DEBUG_COLOR (1ull << 40) #define DEBUG_REEMIT (1ull << 41) +#define DEBUG_HANG(1ull << 42) /* These flags are not compatible with the disk shader cache */ #define DEBUG_DISK_CACHE_DISABLE_MASK DEBUG_SHADER_TIME diff --git a/src/intel/common/gen_hang.c b/src/intel/common/gen_hang.c new file mode 100644 index 000..5f0dd4e0640 --- /dev/null +++ b/src/intel/common/gen_hang.c @@ -0,0 +1,176 @@ +/* + * Copyright © 2019 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +/** + * \file gen_hang.c + * + * Support for wilfully injecting GPU hangs. + */ + +#include +#include +#include +#include +#include +#include + +#include "drm-uapi/drm.h" +#include "drm-uapi/i915_drm.h" + +#include "gen_hang.h" + +static uint32_t __gem_create(int fd, uint64_t size) +{ + struct drm_i915_gem_create arg = { .size = size }; + drmIoctl(fd, DRM_IOCTL_I915_GEM_CREATE, &arg); + return arg.handle; +} + +static int __gem_set_caching(int fd, uint32_t handle, unsigned int caching) +{ + struct drm_i915_gem_caching arg = { .handle = handle, .caching = caching }; + return drmIoctl(fd, DRM_IOCTL_I915_GEM_SET_CACHING, &arg) ? -errno : 0; +} + +static void * +__gem_mmap(int fd, uint32_t handle, uint64_t offset, uint64_t size) +{ + struct drm_i915_gem_mmap arg = { + .handle = handle, + .offset = offset, + .size = size, + .addr_ptr = -1, + }; + drmIoctl(fd, DRM_IOCTL_I915_GEM_MMAP, &arg); + return (void *)(uintptr_t)arg.addr_ptr; +} + +static void __gem_close(int fd, uint32_t handle) +{ + drmIoctl(fd, DRM_IOCTL_GEM_CLOSE, &handle); +} + +#define HANG_GENMASK 0xff +#define HANG_ALLOW_PREEMPTION (1 << 8) +#define HANG_IMMEDIATE (1 << 9) + +int +gen_inject_hang(int fd, uint32_t ctx, unsigned int flags) +{ + struct drm_i915_gem_relocation_entry reloc[2] = {}; + struct drm_i915_gem_exec_object2 obj = { + .handle = __gem_create(fd, 4096), + .relocation_count = 2, + .relocs_ptr = (uintptr_t)reloc, + }; + struct drm_i915_gem_execbuffer2 eb = { + .buffers_ptr = (uintp
[Mesa-dev] [PATCH] i965: Be resilient in the face of GPU hangs
If we hang the GPU and end up banning our context, we will no longer be able to submit and abort with an error (exit(1) no less). As we submit minimal incremental batches that rely on the logical context state of previous batches, we can not rely on the kernel's recovery mechanism which tries to restore the context back to a "golden" renderstate (the default HW setup) and replay the batches in flight. Instead, we must create a new context and set it up, including all the lost register settings that we only apply once during setup, before allow the user to continue rendering. The batches already submitted are lost (unrecoverable) so there will be a momentarily glitch and lost rendering across frames, but the application should be able to recover and continue on fairly oblivious. To make wedging even more likely, we use a new "no recovery" context parameter that tells the kernel to not even attempt to replay any batches in flight against the default context image, as experience shows the HW is not always robust enough to cope with the conflicting state. v2: Export brw_reset_state() to improve the amount of state we clobber on return to a starting context. (Kenneth) Cc: Kenneth Graunke --- The intent was to refactor the existing brw_reset_state() out of brw_init_state() so that we could reuse, so reuse it! --- src/mesa/drivers/dri/i965/brw_bufmgr.c| 25 +++ src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 ++ src/mesa/drivers/dri/i965/brw_context.h | 3 +++ src/mesa/drivers/dri/i965/brw_state_upload.c | 22 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 20 +++ 5 files changed, 67 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index b33a30930db..d8a9f0c450d 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1589,6 +1589,16 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) } } +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = 0x8, // I915_CONTEXT_PARAM_RECOVERABLE, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); +} + uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr) { @@ -1599,6 +1609,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) return 0; } + init_context(bufmgr, create.ctx_id); + return create.ctx_id; } @@ -1621,6 +1633,19 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, return err; } +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = I915_CONTEXT_PARAM_PRIORITY, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); + + return p.value; /* on error, return 0 i.e. default priority */ +} + void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) { diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..886b2e607ce 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -356,6 +356,8 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id, int priority); +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 66fe5b3a8a0..4a306c4217a 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1647,6 +1647,9 @@ brw_get_graphics_reset_status(struct gl_context *ctx); void brw_check_for_reset(struct brw_context *brw); +void +brw_reset_state(struct brw_context *brw); + /* brw_compute.c */ extern void brw_init_compute_functions(struct dd_function_table *functions); diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 50049d325b3..a320c24edc5 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -228,12 +228,8 @@ brw_copy_pipeline_atoms(struct brw_context *brw, void brw_init_state( struct brw_context *brw ) { - struct gl_context *ctx = &brw->ctx; const struct gen_device_info *devinfo = &brw->screen->devinfo; - /* Force the first brw_select_pipeline to emit pipeline select */ - brw->last_pipeline = BRW_NUM_PIPELINES; - brw_init_caches(brw); if (devinfo->gen >= 11) @@ -257,6 +253,16 @@ void brw_init_state( struct brw_context *brw ) else gen4_init_atoms(brw); + brw_reset_state(brw); +} + +void brw_reset_state( struct brw_context *br
[Mesa-dev] [PATCH v2 3/3] i965: Be resilient in the face of GPU hangs
If we hang the GPU and end up banning our context, we will no longer be able to submit and abort with an error (exit(1) no less). As we submit minimal incremental batches that rely on the logical context state of previous batches, we can not rely on the kernel's recovery mechanism which tries to restore the context back to a "golden" renderstate (the default HW setup) and replay the batches in flight. Instead, we must create a new context and set it up, including all the lost register settings that we only apply once during setup, before allow the user to continue rendering. The batches already submitted are lost (unrecoverable) so there will be a momentarily glitch and lost rendering across frames, but the application should be able to recover and continue on fairly oblivious. To make wedging even more likely, we use a new "no recovery" context parameter that tells the kernel to not even attempt to replay any batches in flight against the default context image, as experience shows the HW is not always robust enough to cope with the conflicting state. v2: Export brw_reset_state() to improve the amount of state we clobber on return to a starting context. (Kenneth) Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_bufmgr.c| 25 src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 + src/mesa/drivers/dri/i965/brw_context.h | 3 ++ src/mesa/drivers/dri/i965/brw_state_upload.c | 40 +++ src/mesa/drivers/dri/i965/intel_batchbuffer.c | 20 ++ 5 files changed, 90 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index b33a30930db..d8a9f0c450d 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1589,6 +1589,16 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) } } +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = 0x8, // I915_CONTEXT_PARAM_RECOVERABLE, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); +} + uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr) { @@ -1599,6 +1609,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) return 0; } + init_context(bufmgr, create.ctx_id); + return create.ctx_id; } @@ -1621,6 +1633,19 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, return err; } +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = I915_CONTEXT_PARAM_PRIORITY, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); + + return p.value; /* on error, return 0 i.e. default priority */ +} + void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) { diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..886b2e607ce 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -356,6 +356,8 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id, int priority); +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 66fe5b3a8a0..4a306c4217a 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1647,6 +1647,9 @@ brw_get_graphics_reset_status(struct gl_context *ctx); void brw_check_for_reset(struct brw_context *brw); +void +brw_reset_state(struct brw_context *brw); + /* brw_compute.c */ extern void brw_init_compute_functions(struct dd_function_table *functions); diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 50049d325b3..6ab0f1ce082 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -284,6 +284,46 @@ void brw_init_state( struct brw_context *brw ) ctx->DriverFlags.NewIntelConservativeRasterization = BRW_NEW_CONSERVATIVE_RASTERIZATION; } +void brw_reset_state( struct brw_context *brw ) +{ + struct gl_context *ctx = &brw->ctx; + + /* Force the first brw_select_pipeline to emit pipeline select */ + brw->last_pipeline = BRW_NUM_PIPELINES; + + brw_upload_initial_gpu_state(brw); + + brw->NewGLState = ~0; + brw->ctx.NewDriverState = ~0ull; + + /* ~0 is a nonsensical value which won't match anything we program, so +* the programming will take effect on the first time around. +*/ + brw->pma_stall_bits = ~0; + + brw->no_depth_or_stencil = false; + + brw->urb.vsize = 0; + brw->u
[Mesa-dev] [PATCH v2 1/3] i965: Assert the execobject handles match for this device
Object handles are local to the device fd, so double check we are not mixing together objects from multiple screens on execbuf submission. Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 0377c677c4c..8097392d22b 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -188,6 +188,8 @@ intel_batchbuffer_init(struct brw_context *brw) static unsigned add_exec_bo(struct intel_batchbuffer *batch, struct brw_bo *bo) { + assert(bo->bufmgr == batch->batch.bo->bufmgr); + unsigned index = READ_ONCE(bo->index); if (index < batch->exec_count && batch->exec_bos[index] == bo) -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Be resilient in the face of GPU hangs
Quoting Chris Wilson (2019-02-14 12:05:00) > If we hang the GPU and end up banning our context, we will no longer be > able to submit and abort with an error (exit(1) no less). As we submit > minimal incremental batches that rely on the logical context state of > previous batches, we can not rely on the kernel's recovery mechanism > which tries to restore the context back to a "golden" renderstate (the > default HW setup) and replay the batches in flight. Instead, we must > create a new context and set it up, including all the lost register > settings that we only apply once during setup, before allow the user to > continue rendering. The batches already submitted are lost > (unrecoverable) so there will be a momentarily glitch and lost rendering > across frames, but the application should be able to recover and > continue on fairly oblivious. > > To make wedging even more likely, we use a new "no recovery" context > parameter that tells the kernel to not even attempt to replay any > batches in flight against the default context image, as experience shows > the HW is not always robust enough to cope with the conflicting state. > > Cc: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_bufmgr.c| 25 +++ > src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 ++ > src/mesa/drivers/dri/i965/intel_batchbuffer.c | 19 ++ > 3 files changed, 46 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > index b33a30930db..289b39cd584 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > @@ -1589,6 +1589,16 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) > } > } > > +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) > +{ > + struct drm_i915_gem_context_param p = { > + .ctx_id = ctx_id, > + .param = 0x7, // I915_CONTEXT_PARAM_RECOVERABLE, > + }; > + > + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); > +} > + > uint32_t > brw_create_hw_context(struct brw_bufmgr *bufmgr) > { > @@ -1599,6 +1609,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) >return 0; > } > > + init_context(bufmgr, create.ctx_id); > + > return create.ctx_id; > } > > @@ -1621,6 +1633,19 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, > return err; > } > > +int > +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) > +{ > + struct drm_i915_gem_context_param p = { > + .ctx_id = ctx_id, > + .param = I915_CONTEXT_PARAM_PRIORITY, > + }; > + > + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); > + > + return p.value; /* on error, return 0 i.e. default priority */ > +} > + > void > brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) > { > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h > b/src/mesa/drivers/dri/i965/brw_bufmgr.h > index 32fc7a553c9..886b2e607ce 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h > @@ -356,6 +356,8 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); > int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, > uint32_t ctx_id, > int priority); > +int > +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); > > void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index 8097392d22b..afb6e2401e3 100644 > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > @@ -748,6 +748,18 @@ execbuffer(int fd, > return ret; > } > > +static void recreate_context(struct brw_context *brw) > +{ > + struct brw_bufmgr *bufmgr = brw->bufmgr; > + int prio; > + > + prio = brw_hw_context_get_priority(bufmgr, brw->hw_ctx); > + brw_destroy_hw_context(bufmgr, brw->hw_ctx); > + > + brw->hw_ctx = brw_create_hw_context(bufmgr); > + brw_hw_context_set_priority(bufmgr, brw->hw_ctx, prio); Hmm, fwiw we can make this into a clone operation in the kernel. That way, we can also do things like move across the ppgtt. We will have the ability to that shortly... -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Be resilient in the face of GPU hangs
If we hang the GPU and end up banning our context, we will no longer be able to submit and abort with an error (exit(1) no less). As we submit minimal incremental batches that rely on the logical context state of previous batches, we can not rely on the kernel's recovery mechanism which tries to restore the context back to a "golden" renderstate (the default HW setup) and replay the batches in flight. Instead, we must create a new context and set it up, including all the lost register settings that we only apply once during setup, before allow the user to continue rendering. The batches already submitted are lost (unrecoverable) so there will be a momentarily glitch and lost rendering across frames, but the application should be able to recover and continue on fairly oblivious. To make wedging even more likely, we use a new "no recovery" context parameter that tells the kernel to not even attempt to replay any batches in flight against the default context image, as experience shows the HW is not always robust enough to cope with the conflicting state. Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_bufmgr.c| 25 +++ src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 ++ src/mesa/drivers/dri/i965/intel_batchbuffer.c | 19 ++ 3 files changed, 46 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index b33a30930db..289b39cd584 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1589,6 +1589,16 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) } } +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = 0x7, // I915_CONTEXT_PARAM_RECOVERABLE, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); +} + uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr) { @@ -1599,6 +1609,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) return 0; } + init_context(bufmgr, create.ctx_id); + return create.ctx_id; } @@ -1621,6 +1633,19 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, return err; } +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = I915_CONTEXT_PARAM_PRIORITY, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); + + return p.value; /* on error, return 0 i.e. default priority */ +} + void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) { diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..886b2e607ce 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -356,6 +356,8 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id, int priority); +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 8097392d22b..afb6e2401e3 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -748,6 +748,18 @@ execbuffer(int fd, return ret; } +static void recreate_context(struct brw_context *brw) +{ + struct brw_bufmgr *bufmgr = brw->bufmgr; + int prio; + + prio = brw_hw_context_get_priority(bufmgr, brw->hw_ctx); + brw_destroy_hw_context(bufmgr, brw->hw_ctx); + + brw->hw_ctx = brw_create_hw_context(bufmgr); + brw_hw_context_set_priority(bufmgr, brw->hw_ctx, prio); +} + static int submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd) { @@ -834,6 +846,13 @@ submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd) if (brw->ctx.Const.ResetStrategy == GL_LOSE_CONTEXT_ON_RESET_ARB) brw_check_for_reset(brw); + if (ret == -EIO) { + recreate_context(brw); + brw->ctx.NewDriverState |= BRW_NEW_CONTEXT; + brw_upload_invariant_state(brw); + ret = 0; + } + if (ret != 0) { fprintf(stderr, "i965: Failed to submit batchbuffer: %s\n", strerror(-ret)); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Context restoration after GPU hangs
The kernel tries to repair a hanging context by restoring the default state (or else we have discovered that the context may be unusably corrupt by the reset). However, this is unsuitable for mesa as it (rightfully) assumes that the context image contains the state it has earlier set and so only emits incremental changes (e.g. it setups up register invariants once at context creation and thenceforth never has to touch those registers again). If we overwrite mesa's state with the default state, that too can lead to nasty hangs. Lose-lose. An alternative is that we tell the kernel not to bother trying to recover from the hang, but report back to userspace that the context is lost immediately and leave it to mesa to do its own recovery. This patch provides the minimum that should do the trick, but there's probably a lot of stones that need to turned over to find the residual state that isn't being reset (since BRW_NEW_CONTEXT went out of fashion ;) I'd like to get this concept acked so that we can land the corresponding kernel patch and make the uAPI official and start putting it to use. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Assert the execobject handles match for this device
Object handles are local to the device fd, so double check we are not mixing together objects from multiple screens on execbuf submission. Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 0377c677c4c..8097392d22b 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -188,6 +188,8 @@ intel_batchbuffer_init(struct brw_context *brw) static unsigned add_exec_bo(struct intel_batchbuffer *batch, struct brw_bo *bo) { + assert(bo->bufmgr == batch->batch.bo->bufmgr); + unsigned index = READ_ONCE(bo->index); if (index < batch->exec_count && batch->exec_bos[index] == bo) -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: add missing rollback of URB requirements
Quoting Kenneth Graunke (2019-01-08 20:17:01) > On Tuesday, January 8, 2019 3:11:37 AM PST Chris Wilson wrote: > > Quoting Lionel Landwerlin (2019-01-08 11:03:26) > > > Hi Andrii, > > > > > > Although I think what these patches do makes sense, I think it's missing > > > the bigger picture. > > > There is a lot more state that gets lost if we have to revert all of the > > > emitted commands. > > > A quick look at brw_upload_pipeline_state() shows all of the programs > > > could be invalid as well, or the number of samples, etc... > > > > > > To me it seems like we need a much larger state save/restore. > > > > > > Ken: what do you think? > > > > How about not rolling back? If we accept it as currently broken, and just > > demand the kernel provide logical contexts for all i965 devices (just ack > > some patches!), and then just flush the batch (possibly with a dummy 3D > > prim if you want to be sure the 3D state is loaded) and rely on the > > context preserving state across batches. > > -Chris > > I'm not sure precisely what you're proposing, but every scenario I can > think of is breaking in some subtle way. > > Option 1: Submit batch at last 3DPRIMITIVE, re-call emit in a new batch. > > This is what we do today; it doesn't work since brw_upload_render_state > implicitly sets CPU-side fields for "current state of the GPU", which > become inaccurate at rollback, so the next state upload would do the > wrong thing. We'd need to roll all those back as well, or only commit > them when we finish uploading state. Atoms also flag other new atoms, > and that's not getting rolled back, but extra flagging is harmless. > > Fixing this means disentangling some of our code, saving and restoring > more values, and so on. Maybe dropping a few CPU micro-optimizations. > > Option 2: Submit batch at last 3DPRIMITIVE (A), memcpy commands for > for failing primitive (B) into a new batch. > > This doesn't work either. Let's say (A) issued a 3DSTATE_INDEX_BUFFER > command with a pointer to the index buffer. (B) didn't change index > buffer state, so it doesn't emit one, intending to reuse the old value. This with all relocs with BRW_NEW_BATCH, and iirc the rule is that they are already tied into that BRW_NEW_BATCH flag. Except you don't need to memcpy, since you already have the 3DSTATE equivalent to the cpu state tracker in the batch, emit that, and it will be recorded in the context for the next batch. (You just don't tell the kernel about the buffers/relocs that cause the aperture exhaustion and it won't do anything about them. The GPU may load stale addresses, but never uses them, exactly like the context state on the next batch anyway.) It's not a great argument, but the argument is that there's less you need to fixup, than having to remember to every single comparison in order to rollback the cpu state tracker. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: add missing rollback of URB requirements
Quoting andrey simiklit (2019-01-08 16:00:45) > On Tue, Jan 8, 2019 at 1:11 PM Chris Wilson wrote: > > Quoting Lionel Landwerlin (2019-01-08 11:03:26) > > Hi Andrii, > > > > Although I think what these patches do makes sense, I think it's missing > > the bigger picture. > > There is a lot more state that gets lost if we have to revert all of the > > emitted commands. > > A quick look at brw_upload_pipeline_state() shows all of the programs > > could be invalid as well, or the number of samples, etc... > > > > To me it seems like we need a much larger state save/restore. > > > > Ken: what do you think? > > How about not rolling back? If we accept it as currently broken, and just > demand the kernel provide logical contexts for all i965 devices (just ack > some patches!), and then just flush the batch (possibly with a dummy 3D > prim if you want to be sure the 3D state is loaded) and rely on the > context preserving state across batches. > -Chris > > > Could you please provide a link to the patches you talking about? I just need an ack for the kernel patches to enable context support. > At the moment as far as I understood the rollback system > helps to reduce quantity of the flush operations in some lucky cases. > When there isn't enough space for a batch aperture, > we will rollback the batch to the saved state, > when limit (means aperture_threshold) wasn't reached by it. > > pseudo code (simple variant, without error handling): > save batch state here; > do > { > add primitives to the batch; > if the batch size limit is exceeded > { > rollback to the saved batch state; > flush the batch; > } > else > { > break; > } > } while(true); > > Are you suggesting to flush the batch every time without waiting for a nearly > full filling of it? > Like this: > add primitives to batch; > flush batch; The suggestion was just thinking about if we detect that this primitive would exceed the aperture, instead of rolling back the batch to before the primitive, unroll the objects/relocs (so that we don't trigger ENOSPC from the kernel) but keep the 3DSTATE without the final PRIM and submit that. Basically it's all about removing no_batch_wrap. Quite icky. All it saves is having to track the incidental details like URB, but you still have to re-emit all the surface state (but that's already a given as it is stored alongside the batch). However, that's all you have to remember. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: add missing rollback of URB requirements
Quoting Lionel Landwerlin (2019-01-08 11:03:26) > Hi Andrii, > > Although I think what these patches do makes sense, I think it's missing > the bigger picture. > There is a lot more state that gets lost if we have to revert all of the > emitted commands. > A quick look at brw_upload_pipeline_state() shows all of the programs > could be invalid as well, or the number of samples, etc... > > To me it seems like we need a much larger state save/restore. > > Ken: what do you think? How about not rolling back? If we accept it as currently broken, and just demand the kernel provide logical contexts for all i965 devices (just ack some patches!), and then just flush the batch (possibly with a dummy 3D prim if you want to be sure the 3D state is loaded) and rely on the context preserving state across batches. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Be resilient in the face of GPU hangs
Quoting Chris Wilson (2018-10-24 09:40:08) > If we hang the GPU and end up banning our context, we will no longer be > able to submit and abort with an error (exit(1) no less). As we submit > minimal incremental batches that rely on the logical context state of > previous batches, we can not rely on the kernel's recovery mechanism > which tries to restore the context back to a "golden" renderstate (the > default HW setup) and replay the batches in flight. Instead, we must > create a new context and set it up, including all the lost register > settings that we only apply once during setup, before allow the user to > continue rendering. The batches already submitted are lost > (unrecoverable) so there will be a momentarily glitch and lost rendering > across frames, but the application should be able to recover and > continue on fairly oblivious. > > To make wedging even more likely, we use a new "no recovery" context > parameter that tells the kernel to not even attempt to replay any > batches in flight against the default context image, as experience shows > the HW is not always robust enough to cope with the conflicting state. > > Cc: Kenneth Graunke So, give or take some forgotten state that is not reset on context reload, what do you think about this as a stepping stone to handling GPU resets robustly? > --- > src/mesa/drivers/dri/i965/brw_bufmgr.c| 25 +++ > src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 ++ > src/mesa/drivers/dri/i965/intel_batchbuffer.c | 19 ++ > 3 files changed, 46 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > index f1675b191c1..328393e2ade 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > @@ -1589,6 +1589,16 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) > } > } > > +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) > +{ > + struct drm_i915_gem_context_param p = { > + .ctx_id = ctx_id, > + .param = 0x7, // I915_CONTEXT_PARAM_RECOVERABLE, > + }; > + > + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); > +} > + > uint32_t > brw_create_hw_context(struct brw_bufmgr *bufmgr) > { > @@ -1599,6 +1609,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) >return 0; > } > > + init_context(bufmgr, create.ctx_id); > + > return create.ctx_id; > } > > @@ -1621,6 +1633,19 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, > return err; > } > > +int > +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) > +{ > + struct drm_i915_gem_context_param p = { > + .ctx_id = ctx_id, > + .param = I915_CONTEXT_PARAM_PRIORITY, > + }; > + > + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); > + > + return p.value; /* on error, return 0 i.e. default priority */ > +} > + > void > brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) > { > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h > b/src/mesa/drivers/dri/i965/brw_bufmgr.h > index 32fc7a553c9..886b2e607ce 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h > @@ -356,6 +356,8 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); > int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, > uint32_t ctx_id, > int priority); > +int > +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); > > void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index 4363b146150..73c2bbab18e 100644 > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > @@ -735,6 +735,18 @@ execbuffer(int fd, > return ret; > } > > +static void recreate_context(struct brw_context *brw) > +{ > + struct brw_bufmgr *bufmgr = brw->bufmgr; > + int prio; > + > + prio = brw_hw_context_get_priority(bufmgr, brw->hw_ctx); > + brw_destroy_hw_context(bufmgr, brw->hw_ctx); > + > + brw->hw_ctx = brw_create_hw_context(bufmgr); > + brw_hw_context_set_priority(bufmgr, brw->hw_ctx, prio); > +} > + > static int > submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd) > { > @@ -821,6 +833,13 @@ submit_batch(struct brw_context *brw, int in_fence_fd, > int *out_fence_fd) > if (brw->ctx.Const
Re: [Mesa-dev] [PATCH 3/6] mesa/st: Factor out array and buffer setup from st_atom_array.c.
Quoting Mathias Fröhlich (2018-11-23 17:14:45) > Hi Chris, > > On Friday, 23 November 2018 16:12:38 CET Chris Wilson wrote: > > > > Something to note here is that valgrind reports > > (piglit/bin/drawoverhead): > > > > ==492== Use of uninitialised value of size 8 > > ==492==at 0x6EB8FA9: cso_hash_find_node (cso_hash.c:198) > > ==492==by 0x6EB9026: cso_hash_insert (cso_hash.c:213) > > ==492==by 0x6EB6730: cso_set_vertex_elements (cso_context.c:1092) > > ==492==by 0x714C76F: set_vertex_attribs (st_atom_array.c:384) > > ==492==by 0x714C76F: st_update_array (st_atom_array.c:512) > > ==492==by 0x71073F3: st_validate_state (st_atom.c:261) > > ==492==by 0x70615D1: prepare_draw (st_draw.c:123) > > ==492==by 0x70615D1: st_draw_vbo (st_draw.c:149) > > ==492==by 0x70F5BF4: _mesa_validated_drawrangeelements (draw.c:850) > > ==492==by 0x70F5BF4: _mesa_validated_drawrangeelements (draw.c:782) > > ==492==by 0x70F5F32: _mesa_DrawElements (draw.c:1004) > > ==492==by 0x48F6C74: stub_glDrawElements (piglit-dispatch-gen.c:12618) > > ==492==by 0x10B3F4: draw (drawoverhead.c:275) > > ==492==by 0x10D070: perf_measure_rate (common.c:56) > > ==492==by 0x10C69E: perf_run (drawoverhead.c:645) > > ==492== Uninitialised value was created by a stack allocation > > ==492==at 0x714C25D: st_update_array (st_atom_array.c:389) > > > > from velements being used sparsely. > > > > Does your patch prevent this? > > I tried to reproduce this, but valgrind does not show any failures with > drawoverhead > on radeonsi. > What driver backend do you use? iris, but we don't hit backends before the error on this path. It may just be a bug fixed not yet merged into iris. Hmm, or a local patch, perhaps: commit 3492e5dfd8e8a7b0f37a3be2a18010628d46f695 Author: Kenneth Graunke Date: Fri Aug 24 15:06:35 2018 -0700 gallium, st/mesa: Support edge flags as a vertex element property Thanks for checking. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] mesa/st: Factor out array and buffer setup from st_atom_array.c.
Quoting mathias.froehl...@gmx.net (2018-11-23 08:07:29) > From: Mathias Fröhlich > > Factor out vertex array setup routines from the array state atom. > The factored functions will be used in feedback rendering in the > next change. > > Signed-off-by: Mathias Fröhlich > --- > src/mesa/state_tracker/st_atom.h | 17 ++ > src/mesa/state_tracker/st_atom_array.c | 79 +++--- > 2 files changed, 74 insertions(+), 22 deletions(-) > > diff --git a/src/mesa/state_tracker/st_atom.h > b/src/mesa/state_tracker/st_atom.h > index 9f3ca38c19..901e9b6d43 100644 > --- a/src/mesa/state_tracker/st_atom.h > +++ b/src/mesa/state_tracker/st_atom.h > @@ -37,6 +37,10 @@ > #include "main/glheader.h" > > struct st_context; > +struct st_vertex_program; > +struct st_vp_variant; > +struct pipe_vertex_buffer; > +struct pipe_vertex_element; > > /** > * Enumeration of state tracker pipelines. > @@ -57,6 +61,19 @@ GLuint st_compare_func_to_pipe(GLenum func); > enum pipe_format > st_pipe_vertex_format(const struct gl_vertex_format *glformat); > > +void > +st_setup_arrays(struct st_context *st, > +const struct st_vertex_program *vp, > +const struct st_vp_variant *vp_variant, > +struct pipe_vertex_element *velements, > +struct pipe_vertex_buffer *vbuffer, unsigned *num_vbuffers); > + > +void > +st_setup_current(struct st_context *st, > + const struct st_vertex_program *vp, > + const struct st_vp_variant *vp_variant, > + struct pipe_vertex_element *velements, > + struct pipe_vertex_buffer *vbuffer, unsigned *num_vbuffers); > > /* Define ST_NEW_xxx_INDEX */ > enum { > diff --git a/src/mesa/state_tracker/st_atom_array.c > b/src/mesa/state_tracker/st_atom_array.c > index cd00529ddf..ac9bd727df 100644 > --- a/src/mesa/state_tracker/st_atom_array.c > +++ b/src/mesa/state_tracker/st_atom_array.c > @@ -384,25 +384,17 @@ set_vertex_attribs(struct st_context *st, > } > > void > -st_update_array(struct st_context *st) > +st_setup_arrays(struct st_context *st, > +const struct st_vertex_program *vp, > +const struct st_vp_variant *vp_variant, > +struct pipe_vertex_element *velements, > +struct pipe_vertex_buffer *vbuffer, unsigned *num_vbuffers) > { > struct gl_context *ctx = st->ctx; > - /* vertex program validation must be done before this */ > - const struct st_vertex_program *vp = st->vp; > - /* _NEW_PROGRAM, ST_NEW_VS_STATE */ > - const GLbitfield inputs_read = st->vp_variant->vert_attrib_mask; > const struct gl_vertex_array_object *vao = ctx->Array._DrawVAO; > + const GLbitfield inputs_read = vp_variant->vert_attrib_mask; > const ubyte *input_to_index = vp->input_to_index; > > - struct pipe_vertex_buffer vbuffer[PIPE_MAX_ATTRIBS]; > - struct pipe_vertex_element velements[PIPE_MAX_ATTRIBS]; > - unsigned num_vbuffers = 0; > - > - st->vertex_array_out_of_memory = FALSE; > - st->draw_needs_minmax_index = false; > - > - /* _NEW_PROGRAM */ > - /* ST_NEW_VERTEX_ARRAYS alias ctx->DriverFlags.NewArray */ > /* Process attribute array data. */ > GLbitfield mask = inputs_read & _mesa_draw_array_bits(ctx); > while (mask) { > @@ -410,7 +402,7 @@ st_update_array(struct st_context *st) >const gl_vert_attrib i = ffs(mask) - 1; >const struct gl_vertex_buffer_binding *const binding > = _mesa_draw_buffer_binding(vao, i); > - const unsigned bufidx = num_vbuffers++; > + const unsigned bufidx = (*num_vbuffers)++; > >if (_mesa_is_bufferobj(binding->BufferObj)) { > struct st_buffer_object *stobj = > st_buffer_object(binding->BufferObj); > @@ -452,16 +444,28 @@ st_update_array(struct st_context *st) > input_to_index[attr]); >} > } > +} > + > +void > +st_setup_current(struct st_context *st, > + const struct st_vertex_program *vp, > + const struct st_vp_variant *vp_variant, > + struct pipe_vertex_element *velements, > + struct pipe_vertex_buffer *vbuffer, unsigned *num_vbuffers) > +{ > + struct gl_context *ctx = st->ctx; > + const GLbitfield inputs_read = vp_variant->vert_attrib_mask; > > - const unsigned first_current_vbuffer = num_vbuffers; > - /* _NEW_PROGRAM | _NEW_CURRENT_ATTRIB */ > /* Process values that should have better been uniforms in the > application */ > GLbitfield curmask = inputs_read & _mesa_draw_current_bits(ctx); > if (curmask) { > + /* vertex program validation must be done before this */ > + const struct st_vertex_program *vp = st->vp; > + const ubyte *input_to_index = vp->input_to_index; >/* For each attribute, upload the maximum possible size. */ >GLubyte data[VERT_ATTRIB_MAX * sizeof(GLdouble) * 4]; >GLubyte *cursor = data; > - c
Re: [Mesa-dev] [PATCH v2 2/3] i965/gen10+: Enable object level preemption.
Quoting Rafael Antognolli (2018-10-29 17:19:53) > +void > +brw_enable_obj_preemption(struct brw_context *brw, bool enable) > +{ > + const struct gen_device_info *devinfo = &brw->screen->devinfo; > + assert(devinfo->gen >= 9); > + > + if (enable == brw->object_preemption) > + return; > + > + /* A fixed function pipe flush is required before modifying this field */ > + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE); > + > + bool replay_mode = enable ? > + GEN9_REPLAY_MODE_MIDOBJECT : GEN9_REPLAY_MODE_MIDBUFFER; > + > + /* enable object level preemption */ > + brw_load_register_imm32(brw, CS_CHICKEN1, > + replay_mode | GEN9_REPLAY_MODE_MASK); > + > + brw->object_preemption = enable; > +} > + > static void > brw_upload_initial_gpu_state(struct brw_context *brw) > { > @@ -153,6 +175,9 @@ brw_upload_initial_gpu_state(struct brw_context *brw) > ADVANCE_BATCH(); >} > } > + > + if (devinfo->gen >= 10) brw->object_preemption = false; > + brw_enable_obj_preemption(brw, true); To force the LRI despite what the context may believe. (To accommodate recreating a logical context following a GPU hang.) -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Be resilient in the face of GPU hangs
If we hang the GPU and end up banning our context, we will no longer be able to submit and abort with an error (exit(1) no less). As we submit minimal incremental batches that rely on the logical context state of previous batches, we can not rely on the kernel's recovery mechanism which tries to restore the context back to a "golden" renderstate (the default HW setup) and replay the batches in flight. Instead, we must create a new context and set it up, including all the lost register settings that we only apply once during setup, before allow the user to continue rendering. The batches already submitted are lost (unrecoverable) so there will be a momentarily glitch and lost rendering across frames, but the application should be able to recover and continue on fairly oblivious. To make wedging even more likely, we use a new "no recovery" context parameter that tells the kernel to not even attempt to replay any batches in flight against the default context image, as experience shows the HW is not always robust enough to cope with the conflicting state. Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_bufmgr.c| 25 +++ src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 ++ src/mesa/drivers/dri/i965/intel_batchbuffer.c | 19 ++ 3 files changed, 46 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index f1675b191c1..328393e2ade 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1589,6 +1589,16 @@ init_cache_buckets(struct brw_bufmgr *bufmgr) } } +static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = 0x7, // I915_CONTEXT_PARAM_RECOVERABLE, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &p); +} + uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr) { @@ -1599,6 +1609,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr) return 0; } + init_context(bufmgr, create.ctx_id); + return create.ctx_id; } @@ -1621,6 +1633,19 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, return err; } +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id) +{ + struct drm_i915_gem_context_param p = { + .ctx_id = ctx_id, + .param = I915_CONTEXT_PARAM_PRIORITY, + }; + + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &p); + + return p.value; /* on error, return 0 i.e. default priority */ +} + void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id) { diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..886b2e607ce 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -356,6 +356,8 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr); int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id, int priority); +int +brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id); void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id); diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 4363b146150..73c2bbab18e 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -735,6 +735,18 @@ execbuffer(int fd, return ret; } +static void recreate_context(struct brw_context *brw) +{ + struct brw_bufmgr *bufmgr = brw->bufmgr; + int prio; + + prio = brw_hw_context_get_priority(bufmgr, brw->hw_ctx); + brw_destroy_hw_context(bufmgr, brw->hw_ctx); + + brw->hw_ctx = brw_create_hw_context(bufmgr); + brw_hw_context_set_priority(bufmgr, brw->hw_ctx, prio); +} + static int submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd) { @@ -821,6 +833,13 @@ submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd) if (brw->ctx.Const.ResetStrategy == GL_LOSE_CONTEXT_ON_RESET_ARB) brw_check_for_reset(brw); + if (ret == -EIO) { + recreate_context(brw); + brw->ctx.NewDriverState |= BRW_NEW_CONTEXT; + brw_upload_invariant_state(brw); + ret = 0; + } + if (ret != 0) { fprintf(stderr, "i965: Failed to submit batchbuffer: %s\n", strerror(-ret)); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] docs: Add a copyright.c template we can copy when making new files.
Quoting Kenneth Graunke (2018-10-19 18:51:36) > Usually when making a new file, people copy some random other file > to get the copyright header comments. Unfortunately, some of them > are commented in a decades-old style, are word wrapped poorly, or > worse, have a few subtle variations in the text. While we've tried > to clean those up, we're not going to get every copy to be perfect. > > Instead, this commit adds docs/copyright.c, which contains a copy of > the license header which is well-formatted and has the correct text. > The idea is that you can start from this when making a new file, which > should help with consistency. > --- > docs/copyright.c | 22 ++ > 1 file changed, 22 insertions(+) > create mode 100644 docs/copyright.c > > Hey all, > > I noticed when writing my new Iris driver that I had a couple subtle > variations of copyright headers creep in, even in a brand new project. > Mostly word wrapping differences. To combat that, I made a copyright.c > and made sure to use it when I created new files. It seemed to help. > > So, the thinking is to just actually put that in the project under docs. > Maybe it helps other people as well? May I suggest spdx.org and in particular using a format such as /* SPDX: MIT */ with a toplevel description of what the link means. From a dev point of view, it's much quicker to see what licence variant is being used and even harder for mistakes to creep in. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] dri: Include kernel release in vendor info
Quoting Ian Romanick (2018-10-10 00:24:00) > On 10/09/2018 06:24 AM, Chris Wilson wrote: > > The userspace driver does not exist in isolation and occasionally > > depends on kernel uapi, and so it is useful in bug reports to include > > that information. (radeonsi, r600 and radv already include utsname) > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=108282 > > --- > > src/mesa/drivers/dri/common/utils.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/common/utils.c > > b/src/mesa/drivers/dri/common/utils.c > > index 5a66bcf8e05..4d0d654dbfd 100644 > > --- a/src/mesa/drivers/dri/common/utils.c > > +++ b/src/mesa/drivers/dri/common/utils.c > > @@ -34,6 +34,8 @@ > > #include > > #include > > #include > > +#include > > + > > #include "main/macros.h" > > #include "main/mtypes.h" > > #include "main/cpuinfo.h" > > @@ -77,11 +79,15 @@ unsigned > > driGetRendererString( char * buffer, const char * hardware_name, > > GLuint agp_mode ) > > { > > + struct utsname uts; > > unsigned offset; > > char *cpu; > > > > offset = sprintf( buffer, "Mesa DRI %s", hardware_name ); > > > > + if (uname(&uts) == 0) > > + offset += sprintf(buffer + offset, " [%s]", uts.release); > > Is there any kind of a limit on how big uts.release can be? All of the > callers pass a fixed-size buffer into the function, and the function has > no way to know how big that buffer is. Currently defined at 64 + '\0'. The function can always be replaced by one that does know buflen. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] dri: Include kernel release in vendor info
The userspace driver does not exist in isolation and occasionally depends on kernel uapi, and so it is useful in bug reports to include that information. (radeonsi, r600 and radv already include utsname) References: https://bugs.freedesktop.org/show_bug.cgi?id=108282 --- src/mesa/drivers/dri/common/utils.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/common/utils.c b/src/mesa/drivers/dri/common/utils.c index 5a66bcf8e05..4d0d654dbfd 100644 --- a/src/mesa/drivers/dri/common/utils.c +++ b/src/mesa/drivers/dri/common/utils.c @@ -34,6 +34,8 @@ #include #include #include +#include + #include "main/macros.h" #include "main/mtypes.h" #include "main/cpuinfo.h" @@ -77,11 +79,15 @@ unsigned driGetRendererString( char * buffer, const char * hardware_name, GLuint agp_mode ) { + struct utsname uts; unsigned offset; char *cpu; offset = sprintf( buffer, "Mesa DRI %s", hardware_name ); + if (uname(&uts) == 0) + offset += sprintf(buffer + offset, " [%s]", uts.release); + /* Append any AGP-specific information. */ switch ( agp_mode ) { -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] i965: Pack simple pipelined query objects into the same buffer
Quoting Kenneth Graunke (2018-10-06 02:57:29) > On Tuesday, October 2, 2018 11:06:23 AM PDT Chris Wilson wrote: > > Reuse the same query object buffer for multiple queries within the same > > batch. > > > > A task for the future is propagating the GL_NO_MEMORY errors. > > > > Signed-off-by: Chris Wilson > > Cc: Kenneth Graunke > > Cc: Matt Turner > > --- > > src/mesa/drivers/dri/i965/brw_context.c | 3 ++ > > src/mesa/drivers/dri/i965/brw_context.h | 10 +++-- > > src/mesa/drivers/dri/i965/brw_queryobj.c | 16 +++ > > src/mesa/drivers/dri/i965/gen6_queryobj.c | 51 ++- > > 4 files changed, 59 insertions(+), 21 deletions(-) > > Don't want to do this. This means that WaitQuery will wait on the whole > group of packed queries instead of just the one the app asked about. > > Vulkan has to pack queries by design, and it turns out this was a real > issue there. See b2c97bc789198427043cd902bc76e194e7e81c7d. Jason ended > up making it busy-wait to avoid bo_waiting on the entire pool, and it > improved Serious Engine performance by 20%. > > We could busy-wait in GL too, for lower latency but more CPU waste, > but I think I prefer separate BOs + bo_wait. It's the same latency for wait as a new BO is used for each batch, and waits are on the batch fence not the individual write into the query batch. (So no change whatsoever for waits from the current code.) Polling is improved as we there we can check the individual fence. Pipelined is unaffected. Now we could keep the query buffer across multiple batches and use fences (userspace seqno + batch handles) to poll or wait on the partial writes. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] i965: Map the query results for the life of the bo
If we map the bo upon creation, we can avoid the latency of mmapping it when querying, and later use the asynchronous, persistent map of the predicate to do a quick query. v2: Inline the wait on results; it disappears shortly in the next few patches. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/gen6_queryobj.c | 40 --- 2 files changed, 29 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3014fa68aff..840332294e6 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -459,6 +459,7 @@ struct brw_query_object { /** Last query BO associated with this query. */ struct brw_bo *bo; + uint64_t *results; /** Last index in bo with query data for this object. */ int last_index; diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index ffdee4040fc..17c10b135d1 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -229,7 +229,9 @@ gen6_queryobj_get_results(struct gl_context *ctx, if (query->bo == NULL) return; - uint64_t *results = brw_bo_map(brw, query->bo, MAP_READ); + brw_bo_wait_rendering(query->bo); + uint64_t *results = query->results; + switch (query->Base.Target) { case GL_TIME_ELAPSED: /* The query BO contains the starting and ending timestamps. @@ -304,7 +306,6 @@ gen6_queryobj_get_results(struct gl_context *ctx, default: unreachable("Unrecognized query target in brw_queryobj_get_results()"); } - brw_bo_unmap(query->bo); /* Now that we've processed the data stored in the query's buffer object, * we can release it. @@ -315,6 +316,25 @@ gen6_queryobj_get_results(struct gl_context *ctx, query->Base.Ready = true; } +static int +gen6_alloc_query(struct brw_context *brw, struct brw_query_object *query) +{ + /* Since we're starting a new query, we need to throw away old results. */ + brw_bo_unreference(query->bo); + + query->bo = brw_bo_alloc(brw->bufmgr, +"query results", 4096, +BRW_MEMZONE_OTHER); + query->results = brw_bo_map(brw, query->bo, + MAP_COHERENT | MAP_PERSISTENT | + MAP_READ | MAP_ASYNC); + + /* For ARB_query_buffer_object: The result is not available */ + set_query_availability(brw, query, false); + + return 0; +} + /** * Driver hook for glBeginQuery(). * @@ -326,15 +346,7 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) { struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; - const int idx = GEN6_QUERY_RESULTS; - - /* Since we're starting a new query, we need to throw away old results. */ - brw_bo_unreference(query->bo); - query->bo = - brw_bo_alloc(brw->bufmgr, "query results", 4096, BRW_MEMZONE_OTHER); - - /* For ARB_query_buffer_object: The result is not available */ - set_query_availability(brw, query, false); + const int idx = gen6_alloc_query(brw, query) + GEN6_QUERY_RESULTS; switch (query->Base.Target) { case GL_TIME_ELAPSED: @@ -548,8 +560,12 @@ gen6_query_counter(struct gl_context *ctx, struct gl_query_object *q) { struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; - brw_query_counter(ctx, q); + const int idx = gen6_alloc_query(brw, query) + GEN6_QUERY_RESULTS; + + brw_write_timestamp(brw, query->bo, idx); set_query_availability(brw, query, true); + + query->flushed = false; } /* Initialize Gen6+-specific query object functions. */ -- 2.19.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] i965: Check last known busy status on bo before asking the kernel
If we know the bo is idle (that is we have no submitted a command buffer referencing this bo since the last query) we can skip asking the kernel. Note this may report a false negative if the target is being shared between processes (exported via dmabuf or flink). To allow the caller control over using the last known flag, the query is split into two. v2: Check against external bo before trusting our own tracking. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 40 -- src/mesa/drivers/dri/i965/brw_bufmgr.h | 11 +-- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index f1675b191c1..d9e8453787c 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -444,18 +444,40 @@ vma_free(struct brw_bufmgr *bufmgr, } } -int +static int +__brw_bo_busy(struct brw_bo *bo) +{ + struct drm_i915_gem_busy busy = { bo->gem_handle }; + + if (bo->idle && !bo->external) + return 0; + + /* If we hit an error here, it means that bo->gem_handle is invalid. +* Treat it as being idle (busy.busy is left as 0) and move along. +*/ + drmIoctl(bo->bufmgr->fd, DRM_IOCTL_I915_GEM_BUSY, &busy); + + bo->idle = !busy.busy; + return busy.busy; +} + +bool brw_bo_busy(struct brw_bo *bo) { - struct brw_bufmgr *bufmgr = bo->bufmgr; - struct drm_i915_gem_busy busy = { .handle = bo->gem_handle }; + return __brw_bo_busy(bo); +} - int ret = drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_BUSY, &busy); - if (ret == 0) { - bo->idle = !busy.busy; - return busy.busy; - } - return false; +bool +brw_bo_map_busy(struct brw_bo *bo, unsigned flags) +{ + unsigned mask; + + if (flags & MAP_WRITE) + mask = ~0u; + else + mask = 0x; + + return __brw_bo_busy(bo) & mask; } int diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..e1f46b091ce 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -323,10 +323,17 @@ int brw_bo_get_tiling(struct brw_bo *bo, uint32_t *tiling_mode, int brw_bo_flink(struct brw_bo *bo, uint32_t *name); /** - * Returns 1 if mapping the buffer for write could cause the process + * Returns false if mapping the buffer is not in active use by the gpu. + * If it returns true, any mapping for for write could cause the process * to block, due to the object being active in the GPU. */ -int brw_bo_busy(struct brw_bo *bo); +bool brw_bo_busy(struct brw_bo *bo); + +/** + * Returns true if mapping the buffer for the set of flags (i.e. MAP_READ or + * MAP_WRITE) will cause the process to block. + */ +bool brw_bo_map_busy(struct brw_bo *bo, unsigned flags); /** * Specify the volatility of the buffer. -- 2.19.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] i965: Use 'available' fence for polling query results
If we always write the 'available' flag after writing the final result of the query, we can probe that predicate to quickly query whether the result is ready from userspace. The primary advantage of checking the predicate is that it allows for more fine-grained queries, we do not have to wait for the batch to finish before the query is marked as ready. We still do check the status of the batch after probing the query so that if the worst happens and the batch did hang without completing the query, we do not spin forever (although it is not as nice as completely eliminating the ioctl, the busy-ioctl is lightweight!). Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- src/mesa/drivers/dri/i965/brw_context.h | 4 +- src/mesa/drivers/dri/i965/gen6_queryobj.c | 54 ++- 2 files changed, 25 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 840332294e6..418941c9194 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -468,8 +468,8 @@ struct brw_query_object { bool flushed; }; -#define GEN6_QUERY_PREDICATE (2) -#define GEN6_QUERY_RESULTS (0) +#define GEN6_QUERY_PREDICATE (0) +#define GEN6_QUERY_RESULTS (1) static inline unsigned gen6_query_predicate_offset(const struct brw_query_object *query) diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index dc70e2a568a..b6832588333 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -40,8 +40,7 @@ #include "intel_buffer_objects.h" static inline void -set_query_availability(struct brw_context *brw, struct brw_query_object *query, - bool available) +set_query_available(struct brw_context *brw, struct brw_query_object *query) { /* For platforms that support ARB_query_buffer_object, we write the * query availability for "pipelined" queries. @@ -58,22 +57,12 @@ set_query_availability(struct brw_context *brw, struct brw_query_object *query, * PIPE_CONTROL with an immediate write will synchronize with * those earlier writes, so we write 1 when the value has landed. */ - if (brw->ctx.Extensions.ARB_query_buffer_object && - brw_is_query_pipelined(query)) { - unsigned flags = PIPE_CONTROL_WRITE_IMMEDIATE; - if (available) { - /* Order available *after* the query results. */ - flags |= PIPE_CONTROL_FLUSH_ENABLE; - } else { - /* Make it unavailable *before* any pipelined reads. */ - flags |= PIPE_CONTROL_CS_STALL; - } - - brw_emit_pipe_control_write(brw, flags, - query->bo, gen6_query_predicate_offset(query), - available); - } + brw_emit_pipe_control_write(brw, + PIPE_CONTROL_WRITE_IMMEDIATE | + PIPE_CONTROL_FLUSH_ENABLE, + query->bo, gen6_query_predicate_offset(query), + true); } static void @@ -144,12 +133,12 @@ write_xfb_overflow_streams(struct gl_context *ctx, } static bool -check_xfb_overflow_streams(uint64_t *results, int count) +check_xfb_overflow_streams(const uint64_t *results, int count) { bool overflow = false; for (int i = 0; i < count; i++) { - uint64_t *result_i = &results[4 * i]; + const uint64_t *result_i = &results[4 * i]; if ((result_i[3] - result_i[2]) != (result_i[1] - result_i[0])) { overflow = true; @@ -221,7 +210,8 @@ emit_pipeline_stat(struct brw_context *brw, struct brw_bo *bo, */ static void gen6_queryobj_get_results(struct gl_context *ctx, - struct brw_query_object *query) + struct brw_query_object *query, + uint64_t *results) { struct brw_context *brw = brw_context(ctx); const struct gen_device_info *devinfo = &brw->screen->devinfo; @@ -229,9 +219,6 @@ gen6_queryobj_get_results(struct gl_context *ctx, if (query->bo == NULL) return; - brw_bo_wait_rendering(query->bo); - uint64_t *results = query->results; - switch (query->Base.Target) { case GL_TIME_ELAPSED: /* The query BO contains the starting and ending timestamps. @@ -329,10 +316,10 @@ gen6_alloc_query(struct brw_context *brw, struct brw_query_object *query) query->results = brw_bo_map(brw, query->bo, MAP_COHERENT | MAP_PERSISTENT | - MAP_READ | MAP_ASYNC); + MAP_READ | MAP_WRITE); /* For ARB_query_buffer_object: The result is not available */ - set_query_availability(brw, query, false); + query->results[GEN6_QUERY_PREDICATE] = false; return 0;
[Mesa-dev] [PATCH 9/9] i965: Set query->flush after flushing the query
Skip the next check for brw_batch_references() by recording when we flush the query. --- src/mesa/drivers/dri/i965/gen6_queryobj.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index f3b9dd24624..d6e670c306e 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -509,14 +509,16 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q) static void flush_batch_if_needed(struct brw_context *brw, struct brw_query_object *query) { + if (query->flushed) + return; + /* If the batch doesn't reference the BO, it must have been flushed * (for example, due to being full). Record that it's been flushed. */ - query->flushed = query->flushed || -!brw_batch_references(&brw->batch, query->bo); - - if (!query->flushed) + if (brw_batch_references(&brw->batch, query->bo)) intel_batchbuffer_flush(brw); + + query->flushed = true; } /** -- 2.19.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] i965: Pack simple pipelined query objects into the same buffer
Reuse the same query object buffer for multiple queries within the same batch. A task for the future is propagating the GL_NO_MEMORY errors. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- src/mesa/drivers/dri/i965/brw_context.c | 3 ++ src/mesa/drivers/dri/i965/brw_context.h | 10 +++-- src/mesa/drivers/dri/i965/brw_queryobj.c | 16 +++ src/mesa/drivers/dri/i965/gen6_queryobj.c | 51 ++- 4 files changed, 59 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 6ba64e4e06d..53912c9c98e 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -953,6 +953,8 @@ brwCreateContext(gl_api api, brw->isl_dev = screen->isl_dev; + brw->query.last_index = 4096; + brw->vs.base.stage = MESA_SHADER_VERTEX; brw->tcs.base.stage = MESA_SHADER_TESS_CTRL; brw->tes.base.stage = MESA_SHADER_TESS_EVAL; @@ -1164,6 +1166,7 @@ intelDestroyContext(__DRIcontext * driContextPriv) brw_bo_unreference(brw->tes.base.push_const_bo); brw_bo_unreference(brw->gs.base.push_const_bo); brw_bo_unreference(brw->wm.base.push_const_bo); + brw_bo_unreference(brw->query.bo); brw_destroy_hw_context(brw->bufmgr, brw->hw_ctx); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 418941c9194..917bb3a7baf 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -462,7 +462,7 @@ struct brw_query_object { uint64_t *results; /** Last index in bo with query data for this object. */ - int last_index; + unsigned index; /** True if we know the batch has been flushed since we ended the query. */ bool flushed; @@ -474,13 +474,13 @@ struct brw_query_object { static inline unsigned gen6_query_predicate_offset(const struct brw_query_object *query) { - return GEN6_QUERY_PREDICATE * sizeof(uint64_t); + return (query->index + GEN6_QUERY_PREDICATE) * sizeof(uint64_t); } static inline unsigned gen6_query_results_offset(const struct brw_query_object *query, unsigned idx) { - return (GEN6_QUERY_RESULTS + idx) * sizeof(uint64_t); + return (query->index + GEN6_QUERY_RESULTS + idx) * sizeof(uint64_t); } struct brw_reloc_list { @@ -1199,6 +1199,10 @@ struct brw_context } cc; struct { + struct brw_bo *bo; + uint64_t *map; + unsigned last_index; + struct brw_query_object *obj; bool begin_emitted; } query; diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index bc4b8c43e7b..c77d630a138 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -188,7 +188,7 @@ brw_queryobj_get_results(struct gl_context *ctx, * run out of space in the query's BO and allocated a new one. If so, * this function was already called to accumulate the results so far. */ - for (i = 0; i < query->last_index; i++) { + for (i = 0; i < query->index; i++) { query->Base.Result += results[i * 2 + 1] - results[i * 2]; } break; @@ -198,7 +198,7 @@ brw_queryobj_get_results(struct gl_context *ctx, /* If the starting and ending PS_DEPTH_COUNT from any of the batches * differ, then some fragments passed the depth test. */ - for (i = 0; i < query->last_index; i++) { + for (i = 0; i < query->index; i++) { if (results[i * 2 + 1] != results[i * 2]) { query->Base.Result = GL_TRUE; break; @@ -304,7 +304,7 @@ brw_begin_query(struct gl_context *ctx, struct gl_query_object *q) */ brw_bo_unreference(query->bo); query->bo = NULL; - query->last_index = -1; + query->index = -1; brw->query.obj = query; @@ -441,7 +441,7 @@ ensure_bo_has_space(struct gl_context *ctx, struct brw_query_object *query) assert(devinfo->gen < 6); - if (!query->bo || query->last_index * 2 + 1 >= 4096 / sizeof(uint64_t)) { + if (!query->bo || query->index * 2 + 1 >= 4096 / sizeof(uint64_t)) { if (query->bo != NULL) { /* The old query BO did not have enough space, so we allocated a new @@ -452,7 +452,7 @@ ensure_bo_has_space(struct gl_context *ctx, struct brw_query_object *query) } query->bo = brw_bo_alloc(brw->bufmgr, "query", 4096, BRW_MEMZONE_OTHER); - query->last_index = 0; + query->index = 0; } } @@ -490,7 +490,7 @@ brw_emit_query_begin(struct brw_context *brw) ensure_bo_has_space(ctx, query); - brw_write_depth_count(brw, query->bo, query->last_index * 2); + brw_write_depth_count(brw, query->bo, query->index * 2); brw->query.begin_emitted = true; } @@ -509,1
[Mesa-dev] [PATCH 8/9] i965: Pass consistent args along gen6_queryobj.c
Be consistent in passing along brw_context rather than switching between that and gl_context. Signed-off-by: Chris Wilson --- src/mesa/drivers/dri/i965/gen6_queryobj.c | 30 +++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index f73f29e8524..f3b9dd24624 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -101,11 +101,10 @@ write_xfb_primitives_written(struct brw_context *brw, } static void -write_xfb_overflow_streams(struct gl_context *ctx, +write_xfb_overflow_streams(struct brw_context *brw, struct brw_bo *bo, int stream, int count, int idx) { - struct brw_context *brw = brw_context(ctx); const struct gen_device_info *devinfo = &brw->screen->devinfo; brw_emit_mi_flush(brw); @@ -209,16 +208,12 @@ emit_pipeline_stat(struct brw_context *brw, struct brw_bo *bo, * Wait on the query object's BO and calculate the final result. */ static void -gen6_queryobj_get_results(struct gl_context *ctx, +gen6_queryobj_get_results(struct brw_context *brw, struct brw_query_object *query, uint64_t *results) { - struct brw_context *brw = brw_context(ctx); const struct gen_device_info *devinfo = &brw->screen->devinfo; - if (query->bo == NULL) - return; - switch (query->Base.Target) { case GL_TIME_ELAPSED: /* The query BO contains the starting and ending timestamps. @@ -235,7 +230,7 @@ gen6_queryobj_get_results(struct gl_context *ctx, /* Ensure the scaled timestamp overflows according to * GL_QUERY_COUNTER_BITS */ - query->Base.Result &= (1ull << ctx->Const.QueryCounterBits.Timestamp) - 1; + query->Base.Result &= (1ull << brw->ctx.Const.QueryCounterBits.Timestamp) - 1; break; case GL_SAMPLES_PASSED_ARB: @@ -401,7 +396,7 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) case GL_PRIMITIVES_GENERATED: write_primitives_generated(brw, query->bo, query->Base.Stream, idx); if (query->Base.Stream == 0) - ctx->NewDriverState |= BRW_NEW_RASTERIZER_DISCARD; + brw->ctx.NewDriverState |= BRW_NEW_RASTERIZER_DISCARD; break; case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN: @@ -409,11 +404,11 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) break; case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, idx); + write_xfb_overflow_streams(brw, query->bo, query->Base.Stream, 1, idx); break; case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, idx); + write_xfb_overflow_streams(brw, query->bo, 0, MAX_VERTEX_STREAMS, idx); break; case GL_VERTICES_SUBMITTED_ARB: @@ -464,7 +459,7 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q) case GL_PRIMITIVES_GENERATED: write_primitives_generated(brw, query->bo, query->Base.Stream, idx); if (query->Base.Stream == 0) - ctx->NewDriverState |= BRW_NEW_RASTERIZER_DISCARD; + brw->ctx.NewDriverState |= BRW_NEW_RASTERIZER_DISCARD; break; case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN: @@ -472,11 +467,11 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q) break; case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, idx); + write_xfb_overflow_streams(brw, query->bo, query->Base.Stream, 1, idx); break; case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, idx); + write_xfb_overflow_streams(brw, query->bo, 0, MAX_VERTEX_STREAMS, idx); break; /* calculate overflow here */ @@ -535,6 +530,9 @@ static void gen6_wait_query(struct gl_context *ctx, struct gl_query_object *q) struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; + if (query->bo == NULL) + return; + /* If the application has requested the query result, but this batch is * still contributing to it, flush it now to finish that work so the * result will become available (eventually). @@ -545,7 +543,7 @@ static void gen6_wait_query(struct gl_context *ctx, struct gl_query_object *q) if (!results[GEN6_QUERY_PREDICATE]) /* not yet available, must wait */ brw_bo_wait_rendering(query->bo); - gen6_queryobj_get_results(ctx, query, results + GEN6_QUERY_RESULTS); + gen6_queryobj_get_results(brw, query, results + GEN6_QUERY_RESULTS); }
[Mesa-dev] [PATCH 2/9] i965: Replace hard-coded indices with const named variables in gen6_queryobj
To simplify replacement later, replace repeated use of explicit 0/1 with local variables of the same value. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- src/mesa/drivers/dri/i965/gen6_queryobj.c | 30 --- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index ce9bb474e18..e3097e878aa 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -326,6 +326,7 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) { struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; + const int idx = 0; /* Since we're starting a new query, we need to throw away old results. */ brw_bo_unreference(query->bo); @@ -356,31 +357,31 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) * obtain the time elapsed. Notably, this includes time elapsed while * the system was doing other work, such as running other applications. */ - brw_write_timestamp(brw, query->bo, 0); + brw_write_timestamp(brw, query->bo, idx); break; case GL_ANY_SAMPLES_PASSED: case GL_ANY_SAMPLES_PASSED_CONSERVATIVE: case GL_SAMPLES_PASSED_ARB: - brw_write_depth_count(brw, query->bo, 0); + brw_write_depth_count(brw, query->bo, idx); break; case GL_PRIMITIVES_GENERATED: - write_primitives_generated(brw, query->bo, query->Base.Stream, 0); + write_primitives_generated(brw, query->bo, query->Base.Stream, idx); if (query->Base.Stream == 0) ctx->NewDriverState |= BRW_NEW_RASTERIZER_DISCARD; break; case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN: - write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 0); + write_xfb_primitives_written(brw, query->bo, query->Base.Stream, idx); break; case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, 0); + write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, idx); break; case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, 0); + write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, idx); break; case GL_VERTICES_SUBMITTED_ARB: @@ -394,7 +395,7 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) case GL_COMPUTE_SHADER_INVOCATIONS_ARB: case GL_TESS_CONTROL_SHADER_PATCHES_ARB: case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB: - emit_pipeline_stat(brw, query->bo, query->Base.Stream, query->Base.Target, 0); + emit_pipeline_stat(brw, query->bo, query->Base.Stream, query->Base.Target, idx); break; default: @@ -415,34 +416,35 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q) { struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; + const int idx = 1; switch (query->Base.Target) { case GL_TIME_ELAPSED: - brw_write_timestamp(brw, query->bo, 1); + brw_write_timestamp(brw, query->bo, idx); break; case GL_ANY_SAMPLES_PASSED: case GL_ANY_SAMPLES_PASSED_CONSERVATIVE: case GL_SAMPLES_PASSED_ARB: - brw_write_depth_count(brw, query->bo, 1); + brw_write_depth_count(brw, query->bo, idx); break; case GL_PRIMITIVES_GENERATED: - write_primitives_generated(brw, query->bo, query->Base.Stream, 1); + write_primitives_generated(brw, query->bo, query->Base.Stream, idx); if (query->Base.Stream == 0) ctx->NewDriverState |= BRW_NEW_RASTERIZER_DISCARD; break; case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN: - write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 1); + write_xfb_primitives_written(brw, query->bo, query->Base.Stream, idx); break; case GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, 1); + write_xfb_overflow_streams(ctx, query->bo, query->Base.Stream, 1, idx); break; case GL_TRANSFORM_FEEDBACK_OVERFLOW_ARB: - write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, 1); + write_xfb_overflow_streams(ctx, query->bo, 0, MAX_VERTEX_STREAMS, idx); break; /* calculate overflow here */ @@ -458,7 +460,7 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q) case GL_TESS_CONTROL_SHADER_PATCHES_ARB: case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB: emit_pipeline_stat(brw, query->bo, - query->Base.Stream, query->Base.Target, 1); +
[Mesa-dev] [PATCH 5/9] i965: Use snoop bo for accessing query results on !llc
Ony non-llc architectures where we are primarily reading back the results of the GPU queries, then we can improve performance by using a cacheable mapping of the results. Unfortunately, enabling snooping makes the writes from the GPU slower, which may adversely affect pipelined query operations (where the results are used directly by the GPU and not CPU). Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- src/mesa/drivers/dri/i965/brw_bufmgr.c| 24 +++ src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 ++ src/mesa/drivers/dri/i965/gen6_queryobj.c | 2 ++ 3 files changed, 28 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index d9e8453787c..3c3bdee3d2a 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -946,6 +946,30 @@ brw_bo_unreference(struct brw_bo *bo) } } +static bool +__brw_bo_set_caching(struct brw_bo *bo, int caching) +{ + struct drm_i915_gem_caching arg = { + .handle = bo->gem_handle, + .caching = caching + }; + return drmIoctl(bo->bufmgr->fd, DRM_IOCTL_I915_GEM_SET_CACHING, &arg) == 0; +} + +void +brw_bo_set_cache_coherent(struct brw_bo *bo) +{ + assert(!bo->external); + if (bo->cache_coherent) + return; + + if (!__brw_bo_set_caching(bo, I915_CACHING_CACHED)) + return; + + bo->reusable = false; + bo->cache_coherent = true; +} + static void bo_wait_with_stall_warning(struct brw_context *brw, struct brw_bo *bo, diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index e1f46b091ce..6f0fe54f79f 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -273,6 +273,8 @@ void brw_bo_unreference(struct brw_bo *bo); #define MAP_INTERNAL_MASK (0xff << 24) #define MAP_RAW (0x01 << 24) +void brw_bo_set_cache_coherent(struct brw_bo *bo); + /** * Maps the buffer into userspace. * diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index 17c10b135d1..dc70e2a568a 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -325,6 +325,8 @@ gen6_alloc_query(struct brw_context *brw, struct brw_query_object *query) query->bo = brw_bo_alloc(brw->bufmgr, "query results", 4096, BRW_MEMZONE_OTHER); + brw_bo_set_cache_coherent(query->bo); + query->results = brw_bo_map(brw, query->bo, MAP_COHERENT | MAP_PERSISTENT | MAP_READ | MAP_ASYNC); -- 2.19.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] i965: Replace open-coded gen6 queryobj offsets with simple helpers
Lots of places open-coded the assumed layout of the predicate/results within the query object, replace those with simple helpers. v2: Fix function decl style. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Matt Turner --- .../drivers/dri/i965/brw_conditional_render.c | 10 -- src/mesa/drivers/dri/i965/brw_context.h| 15 +++ src/mesa/drivers/dri/i965/gen6_queryobj.c | 6 +++--- src/mesa/drivers/dri/i965/hsw_queryobj.c | 18 +- 4 files changed, 35 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c b/src/mesa/drivers/dri/i965/brw_conditional_render.c index e33e79fb6ce..0177a7f80b4 100644 --- a/src/mesa/drivers/dri/i965/brw_conditional_render.c +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c @@ -87,8 +87,14 @@ set_predicate_for_occlusion_query(struct brw_context *brw, */ brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE); - brw_load_register_mem64(brw, MI_PREDICATE_SRC0, query->bo, 0 /* offset */); - brw_load_register_mem64(brw, MI_PREDICATE_SRC1, query->bo, 8 /* offset */); + brw_load_register_mem64(brw, + MI_PREDICATE_SRC0, + query->bo, + gen6_query_results_offset(query, 0)); + brw_load_register_mem64(brw, + MI_PREDICATE_SRC1, + query->bo, + gen6_query_results_offset(query, 1)); } static void diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 7fd15669eb9..3014fa68aff 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -467,6 +467,21 @@ struct brw_query_object { bool flushed; }; +#define GEN6_QUERY_PREDICATE (2) +#define GEN6_QUERY_RESULTS (0) + +static inline unsigned +gen6_query_predicate_offset(const struct brw_query_object *query) +{ + return GEN6_QUERY_PREDICATE * sizeof(uint64_t); +} + +static inline unsigned +gen6_query_results_offset(const struct brw_query_object *query, unsigned idx) +{ + return (GEN6_QUERY_RESULTS + idx) * sizeof(uint64_t); +} + struct brw_reloc_list { struct drm_i915_gem_relocation_entry *relocs; int reloc_count; diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c index e3097e878aa..ffdee4040fc 100644 --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c @@ -71,7 +71,7 @@ set_query_availability(struct brw_context *brw, struct brw_query_object *query, } brw_emit_pipe_control_write(brw, flags, - query->bo, 2 * sizeof(uint64_t), + query->bo, gen6_query_predicate_offset(query), available); } } @@ -326,7 +326,7 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q) { struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; - const int idx = 0; + const int idx = GEN6_QUERY_RESULTS; /* Since we're starting a new query, we need to throw away old results. */ brw_bo_unreference(query->bo); @@ -416,7 +416,7 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q) { struct brw_context *brw = brw_context(ctx); struct brw_query_object *query = (struct brw_query_object *)q; - const int idx = 1; + const int idx = GEN6_QUERY_RESULTS + 1; switch (query->Base.Target) { case GL_TIME_ELAPSED: diff --git a/src/mesa/drivers/dri/i965/hsw_queryobj.c b/src/mesa/drivers/dri/i965/hsw_queryobj.c index 24f52a7d752..120733c759a 100644 --- a/src/mesa/drivers/dri/i965/hsw_queryobj.c +++ b/src/mesa/drivers/dri/i965/hsw_queryobj.c @@ -191,7 +191,7 @@ load_overflow_data_to_cs_gprs(struct brw_context *brw, struct brw_query_object *query, int idx) { - int offset = idx * sizeof(uint64_t) * 4; + int offset = gen6_query_results_offset(query, 0) + idx * sizeof(uint64_t) * 4; brw_load_register_mem64(brw, HSW_CS_GPR(1), query->bo, offset); @@ -283,7 +283,7 @@ hsw_result_to_gpr0(struct gl_context *ctx, struct brw_query_object *query, brw_load_register_mem64(brw, HSW_CS_GPR(0), query->bo, - 2 * sizeof(uint64_t)); + gen6_query_predicate_offset(query)); return; } @@ -300,7 +300,7 @@ hsw_result_to_gpr0(struct gl_context *ctx, struct brw_query_object *query, brw_load_register_mem64(brw, HSW_CS_GPR(0), query->bo, - 0 * sizeof(uint64_t)); + gen6_query_results_offset(query,
Re: [Mesa-dev] [PATCH] i965/batch: don't ignore the 'brw_new_batch' call for a 'new batch'
Quoting andrey simiklit (2018-08-21 13:00:57) > Hi all, > > The bug for this issue was created: > https://bugs.freedesktop.org/show_bug.cgi?id=107626 What about something like diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 8536c040109..babb8d4676d 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -885,7 +885,7 @@ brw_draw_single_prim(struct gl_context *ctx, { struct brw_context *brw = brw_context(ctx); const struct gen_device_info *devinfo = &brw->screen->devinfo; - bool fail_next = false; + bool fail_next; /* Flag BRW_NEW_DRAW_CALL on every draw. This allows us to have * atoms that happen on every draw call. @@ -898,6 +898,7 @@ brw_draw_single_prim(struct gl_context *ctx, intel_batchbuffer_require_space(brw, 1500); brw_require_statebuffer_space(brw, 2400); intel_batchbuffer_save_state(brw); + fail_next = brw->batch.saved.map_next == 0; if (brw->num_instances != prim->num_instances || brw->basevertex != prim->basevertex || There's no point reverting to the last saved point if that save point is the empty batch, we will just repeat ourselves. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Bump aperture tracking to u64
As a prelude to handling large address spaces, first allow ourselves the luxury of handling the full 4G. Reported-by: Andrey Simiklit Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.h | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 9 + src/mesa/drivers/dri/i965/intel_batchbuffer.h | 8 ++-- 3 files changed, 8 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index d3b96953467..ca1fe8ef0ea 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -509,7 +509,7 @@ struct intel_batchbuffer { int exec_array_size; /** The amount of aperture space (in bytes) used by all exec_bos */ - int aperture_space; + uint64_t aperture_space; struct { uint32_t *map_next; diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 65d2c64e319..4363b146150 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -870,7 +870,7 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw, bytes_for_commands, 100.0f * bytes_for_commands / BATCH_SZ, bytes_for_state, 100.0f * bytes_for_state / STATE_SZ, brw->batch.exec_count, - (float) brw->batch.aperture_space / (1024 * 1024), + (float) (brw->batch.aperture_space / (1024 * 1024)), brw->batch.batch_relocs.reloc_count, brw->batch.state_relocs.reloc_count); @@ -890,13 +890,6 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw, return ret; } -bool -brw_batch_has_aperture_space(struct brw_context *brw, unsigned extra_space) -{ - return brw->batch.aperture_space + extra_space <= - brw->screen->aperture_threshold; -} - bool brw_batch_references(struct intel_batchbuffer *batch, struct brw_bo *bo) { diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index d10948f1916..0632142cd31 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -44,8 +44,12 @@ int _intel_batchbuffer_flush_fence(struct brw_context *brw, void intel_batchbuffer_data(struct brw_context *brw, const void *data, GLuint bytes); -bool brw_batch_has_aperture_space(struct brw_context *brw, - unsigned extra_space_in_bytes); +static inline bool +brw_batch_has_aperture_space(struct brw_context *brw, uint64_t extra_space) +{ + return brw->batch.aperture_space + extra_space <= + brw->screen->aperture_threshold; +} bool brw_batch_references(struct intel_batchbuffer *batch, struct brw_bo *bo); -- 2.19.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: clearify map_gtt cases
Quoting Kenneth Graunke (2018-09-04 16:13:59) > On Tuesday, September 4, 2018 2:57:29 AM PDT Lionel Landwerlin wrote: > > Both brw_bo_map_cpu() & brw_bo_map_wc() assert if mapping the > > underlying BO fails. Failing back to brw_bo_map_gtt() doesn't seem to > > make any sense for that reason. > > > > We also only call brw_bo_map_gtt() for tiled buffers which as far as > > we know are never mapped coherent (coherent is a parameter reserved > > for buffer object which are always allocated linear). So explicitly > > assert if we ever run into this case. > > > > This makes checking the kernel about whether GTT maps are actually > > coherent unnecessary. > > > > Signed-off-by: Lionel Landwerlin > > --- > > src/mesa/drivers/dri/i965/brw_bufmgr.c | 21 ++--- > > 1 file changed, 2 insertions(+), 19 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > index f1675b191c1..e0008905164 100644 > > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > @@ -1095,6 +1095,8 @@ brw_bo_map_gtt(struct brw_context *brw, struct brw_bo > > *bo, unsigned flags) > > { > > struct brw_bufmgr *bufmgr = bo->bufmgr; > > > > + assert((flags & MAP_COHERENT) == 0); > > + > > /* Get a mapping of the buffer if we haven't before. */ > > if (bo->map_gtt == NULL) { > >DBG("bo_map_gtt: mmap %d (%s)\n", bo->gem_handle, bo->name); > > @@ -1186,25 +1188,6 @@ brw_bo_map(struct brw_context *brw, struct brw_bo > > *bo, unsigned flags) > > else > >map = brw_bo_map_wc(brw, bo, flags); > > > > - /* Allow the attempt to fail by falling back to the GTT where necessary. > > -* > > -* Not every buffer can be mmaped directly using the CPU (or WC), for > > -* example buffers that wrap stolen memory or are imported from other > > -* devices. For those, we have little choice but to use a GTT mmapping. > > -* However, if we use a slow GTT mmapping for reads where we expected > > fast > > -* access, that order of magnitude difference in throughput will be > > clearly > > -* expressed by angry users. > > -* > > -* We skip MAP_RAW because we want to avoid map_gtt's fence detiling. > > -*/ > > - if (!map && !(flags & MAP_RAW)) { > > - if (brw) { > > - perf_debug("Fallback GTT mapping for %s with access flags %x\n", > > -bo->name, flags); > > - } > > - map = brw_bo_map_gtt(brw, bo, flags); > > - } > > - > > return map; > > } > > > > > > This will break on Cherryview Chromebook kernels, which are too old to > support WC maps, and therefore fall back to GTT maps even for linear > buffers. :( > > Which brings up an awful point...in that case...we will hit GTT maps > with the MAP_COHERENT bit set. Assuming Cherryview has the buffering > Chris mentioned, then we're hosed. It seems like our options are: > > 1. Mandate new kernels with WC support (not likely to fly...) > 2. Disable ARB_buffer_storage/EXT_buffer_storage if the platform has >this buffering and the kernel doesn't support WC maps > 3. Just continue exposing the broken semi-coherent map, treat it as a >known bug that would be fixed by upgrading the kernel... > > Chris, what platforms are affected by this? All Baytrail+ atoms (which were the first we found it on) and it seems all next generation GGTT, Canonlake+. Option 3. The window of error is roughly the time it takes to do an uncached read, so the likelihood of two clients observing different values for the same location in memory is small (but non-zero), but honestly I expect it to be hidden behind inter-client synchronisation (or else they would be expecting stale data and don't care). The biggest risk (imo) would be something like using a batchbuffer where you want to modify commands after submission. (Who would do such a thing! It's bitten me more times than I'd care to admit ;) I can imagine a few graphical glitches, if you tried to stream textures while also processing them on the GPU. So for option 3, make the perf_debug() recommend a newer kernel to avoid taking the GTT path? I would like to keep the warning comment around so that as soon as we can, we can realise that assert. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965: clearify map_gtt cases
Quoting Lionel Landwerlin (2018-09-04 11:30:42) > Both brw_bo_map_cpu() & brw_bo_map_wc() assert if mapping the > underlying BO fails. Failing back to brw_bo_map_gtt() doesn't seem to > make any sense for that reason. > > We also only call brw_bo_map_gtt() for tiled buffers which as far as > we know are never mapped coherent (coherent is a parameter reserved > for buffer objects which are always allocated linear). So explicitly > assert if we ever run into this case. > > This makes checking the kernel about whether GTT maps are actually > coherent unnecessary. > > v2: Add some explanation (Chris/Lionel) > > Signed-off-by: Lionel Landwerlin > --- > src/mesa/drivers/dri/i965/brw_bufmgr.c | 34 -- > 1 file changed, 15 insertions(+), 19 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > index f1675b191c1..95c5439a521 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > @@ -1095,6 +1095,21 @@ brw_bo_map_gtt(struct brw_context *brw, struct brw_bo > *bo, unsigned flags) > { > struct brw_bufmgr *bufmgr = bo->bufmgr; > > + /* It was recently discovered that some platforms cannot do coherent GTT > +* mappings. The kernel API previously assumed that they were. To deal > with > +* what is effectively a change in ABI, the kernel introduced a > +* I915_PARAM_MMAP_GTT_COHERENT getparam to let userspace know whether the > +* underlying platform is capable of coherent GTT mappings. > +* > +* Some digging in the i965 codebase revealed that we don't actually do > +* coherent GTT mappings. Coherent mappings are only used for buffer > +* objects which are allocated linear and we only call this function for > +* non linear buffers. The couple of asserts below are a reminder of the > +* cases where we should use GTT maps. > +*/ > + assert(bo->tiling_mode != I915_TILING_NONE); > + assert((flags & MAP_COHERENT) == 0); Lucky you only have to support platforms with working WC :) The explanation matches with my understanding, Reviewed-by: Chris Wilson -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: clearify map_gtt cases
Quoting Lionel Landwerlin (2018-09-04 10:57:29) > Both brw_bo_map_cpu() & brw_bo_map_wc() assert if mapping the > underlying BO fails. Failing back to brw_bo_map_gtt() doesn't seem to > make any sense for that reason. > > We also only call brw_bo_map_gtt() for tiled buffers which as far as > we know are never mapped coherent (coherent is a parameter reserved > for buffer object which are always allocated linear). So explicitly > assert if we ever run into this case. > > This makes checking the kernel about whether GTT maps are actually > coherent unnecessary. > > Signed-off-by: Lionel Landwerlin > --- > src/mesa/drivers/dri/i965/brw_bufmgr.c | 21 ++--- > 1 file changed, 2 insertions(+), 19 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > index f1675b191c1..e0008905164 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > @@ -1095,6 +1095,8 @@ brw_bo_map_gtt(struct brw_context *brw, struct brw_bo > *bo, unsigned flags) > { > struct brw_bufmgr *bufmgr = bo->bufmgr; > /* Please do explain why :) * * We have been doing a good job explaining the intricacies with both * the kernel and hw interfaces, so keep up the good work! * * I do think it is worth mentioning the incoherency issue, so that if * anybody ever suggests GTT mmaps are a good idea, we can lambaste them. */ -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Deny persistent mappings of incoherent GTT mmaps
Quoting Lionel Landwerlin (2018-09-04 01:07:12) > Talking with Ken about this, it seems we might not actually coherent use > GTT maps because those are just for buffer (which are allocated linear). > GTT maps are only used with tiled buffers. > > So we most likely don't even need this patch. > The code is confusing though. Still document that you don't :-p So reduce it to an assert? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Deny persistent mappings of incoherent GTT mmaps
Quoting Lionel Landwerlin (2018-08-31 12:32:23) > On 31/08/2018 12:22, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2018-08-31 12:16:19) > >> We would need a fairly recent kernel (drm-tip?) to test this in CI. > > Unpatched mesa, assumes all is fine. > > Post-patch mesa, assumes all is broken. > > > > So we can quickly see if anything actually fails if a persistent GGTT > > mmap is rejected. Which is the important part for determining if such > > exclusion will harm anyone. The problem is then is the risk of > > corruption worth keeping it around. > > > >> I can't see any issue with this because we always have the meta path as > >> a fallback for tiled buffers. > > I'm worried if the mmap actually leaks through to glMapBufferRange with > > say GL_MAP_PERSISTENT_BIT. Hmm, maybe that's all ok so long at the > > client flushes are explicit. > > > As far as I can tell, buffers are linear, so brw_bo_map() wouldn't even > try to map in gtt at first. > > Then brw_bo_map_wc() would assert if it failed. > > So we're fine? :) I like that plan. Not having to rely on GGTT will save a lot of headaches. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Deny persistent mappings of incoherent GTT mmaps
Quoting Lionel Landwerlin (2018-08-31 12:16:19) > We would need a fairly recent kernel (drm-tip?) to test this in CI. Unpatched mesa, assumes all is fine. Post-patch mesa, assumes all is broken. So we can quickly see if anything actually fails if a persistent GGTT mmap is rejected. Which is the important part for determining if such exclusion will harm anyone. The problem is then is the risk of corruption worth keeping it around. > I can't see any issue with this because we always have the meta path as > a fallback for tiled buffers. I'm worried if the mmap actually leaks through to glMapBufferRange with say GL_MAP_PERSISTENT_BIT. Hmm, maybe that's all ok so long at the client flushes are explicit. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Deny persistent mappings of incoherent GTT mmaps
On more recent HW, the indirect writes via the GGTT are internally buffered presenting a nuisance to trying to use them for persistent client maps. (We cannot guarantee that any write by the client will then be visible to either the GPU or third parties in a timely manner, leading to corruption.) Fortunately, we try very hard not to even use the GGTT for anything and even less so for persistent mmaps, so their loss is unlikely to be noticed. No piglits harmed. Cc: Kenneth Graunke Cc: Lionel Landwerlin Cc: Joonas Lahtinen --- include/drm-uapi/i915_drm.h | 22 +++ src/mesa/drivers/dri/i965/brw_bufmgr.c | 36 src/mesa/drivers/dri/i965/intel_screen.c | 3 ++ src/mesa/drivers/dri/i965/intel_screen.h | 1 + 4 files changed, 62 insertions(+) diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h index 16e452aa12d..268b585f8a4 100644 --- a/include/drm-uapi/i915_drm.h +++ b/include/drm-uapi/i915_drm.h @@ -529,6 +529,28 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51 +/* + * Once upon a time we supposed that writes through the GGTT would be + * immediately in physical memory (once flushed out of the CPU path). However, + * on a few different processors and chipsets, this is not necessarily the case + * as the writes appear to be buffered internally. Thus a read of the backing + * storage (physical memory) via a different path (with different physical tags + * to the indirect write via the GGTT) will see stale values from before + * the GGTT write. Inside the kernel, we can for the most part keep track of + * the different read/write domains in use (e.g. set-domain), but the assumption + * of coherency is baked into the ABI, hence reporting its true state in this + * parameter. + * + * Reports true when writes via mmap_gtt are immediately visible following an + * lfence to flush the WCB. + * + * Reports false when writes via mmap_gtt are indeterminately delayed in an in + * internal buffer and are _not_ immediately visible to third parties accessing + * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC + * communications channel when reporting false is strongly disadvised. + */ +#define I915_PARAM_MMAP_GTT_COHERENT 52 + typedef struct drm_i915_getparam { __s32 param; /* diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index f1675b191c1..6955c5c890c 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -1068,6 +1068,19 @@ brw_bo_map_wc(struct brw_context *brw, struct brw_bo *bo, unsigned flags) return bo->map_wc; } +static void +bo_set_domain(struct brw_bo *bo, unsigned int read, unsigned int write) +{ + struct brw_bufmgr *bufmgr = bo->bufmgr; + + struct drm_i915_gem_set_domain arg = { + .handle = bo->gem_handle, + .read_domains = read, + .write_domain = write, + }; + drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &arg); +} + /** * Perform an uncached mapping via the GTT. * @@ -1095,6 +1108,25 @@ brw_bo_map_gtt(struct brw_context *brw, struct brw_bo *bo, unsigned flags) { struct brw_bufmgr *bufmgr = bo->bufmgr; + /* Once upon a time we believed that there was no internal buffering of +* the indirect writes via the Global GTT; that is once flushed from +* the processor the write would be immediately visible to any one +* else reading that memory location be they the GPU, kernel or another +* client. As it turns out, on modern hardware there is an internal buffer +* that cannot be directly flushed (e.g. using the sfence one would normally +* use to flush the WCB) and so far the w/a requires us to do an uncached +* mmio read, quite expensive and requires user cooperation. That is we +* cannot simply support persistent user access to the GTT mmap buffers +* as we have no means of flushing their writes in a timely manner. +*/ + if (flags & MAP_PERSISTENT && + flags & MAP_COHERENT && + flags & MAP_WRITE && + !(brw->screen->kernel_features & KERNEL_ALLOWS_COHERENT_MMAP_GTT)) { + DBG("bo_map_gtt: rejected attempt to make a coherent, persistent and writable GGTT mmap, %d (%s)\n", bo->gem_handle, bo->name); + return NULL; + } + /* Get a mapping of the buffer if we haven't before. */ if (bo->map_gtt == NULL) { DBG("bo_map_gtt: mmap %d (%s)\n", bo->gem_handle, bo->name); @@ -1138,6 +1170,10 @@ brw_bo_map_gtt(struct brw_context *brw, struct brw_bo *bo, unsigned flags) if (!(flags & MAP_ASYNC)) { bo_wait_with_stall_warning(brw, bo, "GTT mapping"); } + if (flags & MAP_WRITE && + !(brw->screen->kernel_features & KERNEL_ALLOWS_COHERENT_MMAP_GTT)) { + bo_set_domain(bo, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT); + } return bo->map_gtt; } diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/in
Re: [Mesa-dev] [PATCH] i965/icl: Set Enabled Texel Offset Precision Fix bit
Quoting Anuj Phogat (2018-08-28 18:53:59) > h/w specification requires this bit to be always set. > > Suggested-by: Kenneth Graunke > Signed-off-by: Anuj Phogat > --- > src/mesa/drivers/dri/i965/brw_defines.h | 4 > src/mesa/drivers/dri/i965/brw_state_upload.c | 7 +++ > 2 files changed, 11 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 433314115b1..1c73ddeb190 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1673,6 +1673,10 @@ enum brw_pixel_shader_coverage_mask_mode { > # define GLK_SCEC_BARRIER_MODE_3D_HULL (1 << 7) > # define GLK_SCEC_BARRIER_MODE_MASKREG_MASK(1 << 7) > > +#define HALF_SLICE_CHICKEN70xE194 > +# define TEXEL_OFFSET_FIX_ENABLE (1 << 1) > +# define TEXEL_OFFSET_FIX_MASK REG_MASK(1 << 7) That mask doesn't match the enable-bit. It'll probably be quite useful to record all the registers you are setting as part of the global setup and read them back later to make sure they stuck. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965: Allow creation of brw_bo from system memory (userptr)
Since v3.16 (though universal access was only enabled by default in v4.6), the kernel has offered the ability to wrap any system memory (i.e. RAM and not I/O mapped memory) into an object that can be used by the GPU. The caveat is that this object is marked as cache coherent (so that the client can continue accessing the memory blissfully ignorant of the synchronisation required with the GPU) and on !llc platforms this means that the object is snooped. Snooping imposes a large performance penalty and is only advised to be used for one-off transfers. However, it provides another useful tool in the driver toolbox. --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 70 +++- src/mesa/drivers/dri/i965/brw_bufmgr.h | 8 +++ src/mesa/drivers/dri/i965/intel_screen.c | 20 +++ src/mesa/drivers/dri/i965/intel_screen.h | 1 + 4 files changed, 98 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index f1675b191c1..ea7886d6df5 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -723,6 +723,74 @@ brw_bo_alloc_tiled_2d(struct brw_bufmgr *bufmgr, const char *name, flags, tiling, stride); } +/* + * Wrap the chunk of client memory given by ptr+size inside a GPU + * buffer, and make it cache coherent (though on non-LLC architectures + * this requires snooping on explicit cache flushes). This allows the + * caller to write into the memory chunk and for those writes to be + * visible on the GPU (exactly as if they create the buffer and then + * persistently mapped it to obtain the pointer). + */ +struct brw_bo * +brw_bo_alloc_userptr(struct brw_bufmgr *bufmgr, + const char *name, void *ptr, uint64_t size) +{ + struct brw_bo *bo = calloc(1, sizeof(*bo)); + if (!bo) + return NULL; + + bo->bufmgr = bufmgr; + bo->name = name; + p_atomic_set(&bo->refcount, 1); + + bo->size = size; + bo->map_cpu = ptr; + bo->userptr = true; + bo->reusable = false; + bo->cache_coherent = true; + bo->idle = true; + bo->kflags = bufmgr->initial_kflags; + + bo->tiling_mode = I915_TILING_NONE; + bo->swizzle_mode = I915_BIT_6_SWIZZLE_NONE; + bo->stride = 0; + + struct drm_i915_gem_userptr arg = { + .user_ptr = (uintptr_t)ptr, + .user_size = size, + .flags = 0, + }; + if (drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_USERPTR, &arg)) { + free(bo); + return NULL; + } + + bo->gem_handle = arg.handle; + + /* Check the buffer for validity before we try and use it in a batch */ + struct drm_i915_gem_set_domain sd = { + .handle = bo->gem_handle, + .read_domains = I915_GEM_DOMAIN_CPU, + }; + if (drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &sd)) + goto err_free; + + if (brw_using_softpin(bufmgr)) { + bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, size, 1); + if (bo->gtt_offset == 0ull) + goto err_free; + } + + VG_DEFINED(ptr, size); /* Presume we write to it using the GPU */ + return bo; + +err_free: + mtx_lock(&bufmgr->lock); + bo_free(bo); + mtx_unlock(&bufmgr->lock); + return NULL; +} + /** * Returns a brw_bo wrapping the given buffer object handle. * @@ -813,7 +881,7 @@ bo_free(struct brw_bo *bo) { struct brw_bufmgr *bufmgr = bo->bufmgr; - if (bo->map_cpu) { + if (bo->map_cpu && !bo->userptr) { VG_NOACCESS(bo->map_cpu, bo->size); drm_munmap(bo->map_cpu, bo->size); } diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 32fc7a553c9..ba9cf67b2ec 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -193,6 +193,11 @@ struct brw_bo { * Boolean of whether this buffer is cache coherent */ bool cache_coherent; + + /** +* Boolean of whether this buffer is a userptr +*/ + bool userptr:1; }; #define BO_ALLOC_BUSY (1<<0) @@ -227,6 +232,9 @@ struct brw_bo *brw_bo_alloc_tiled(struct brw_bufmgr *bufmgr, uint32_t pitch, unsigned flags); +struct brw_bo *brw_bo_alloc_userptr(struct brw_bufmgr *bufmgr, +const char *name, void *ptr, uint64_t size); + /** * Allocate a tiled buffer object. * diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index f1c195c5d14..1ba2f021fcd 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1889,6 +1889,22 @@ intel_detect_swizzling(struct intel_screen *screen) return swizzle_mode != I915_BIT_6_SWIZZLE_NONE; } +static bool +intel_detect_userptr(struct intel_screen *screen) +{ + struct drm_i915_gem_userptr arg = { + .user_ptr = -4096ULL, + .user_size = 8192, /* invalid 64b wrap around */ + }; + + if (screen->devinfo.has_snoop_bug)
[Mesa-dev] [PATCH 1/4] intel: Mark i965g/i965gm as having buggy snoop access
Recent kernels do exclude snoop access for i965g/i965gm as it does not work as advertised. However to avoid depending on a recent kernel for old hardware, mark the presence of the bug in gen_device_info. See kernel commit df0700e53047662c167836bd6fdeea55d5d8dcfa Author: Chris Wilson Date: Wed Sep 6 20:24:24 2017 +0100 drm/i915: Disable snooping (userptr, set-cache-level) on gen4 --- src/intel/dev/gen_device_info.c | 1 + src/intel/dev/gen_device_info.h | 8 2 files changed, 9 insertions(+) diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c index b0ae4d18034..d0449a49389 100644 --- a/src/intel/dev/gen_device_info.c +++ b/src/intel/dev/gen_device_info.c @@ -93,6 +93,7 @@ gen_get_pci_device_id_override(void) static const struct gen_device_info gen_device_info_i965 = { .gen = 4, + .has_snoop_bug = true, .has_negative_rhw_bug = true, .num_slices = 1, .num_subslices = { 1, }, diff --git a/src/intel/dev/gen_device_info.h b/src/intel/dev/gen_device_info.h index 291a3cce8f8..80da9f713fa 100644 --- a/src/intel/dev/gen_device_info.h +++ b/src/intel/dev/gen_device_info.h @@ -80,6 +80,14 @@ struct gen_device_info */ bool has_negative_rhw_bug; + /** +* Some specific Intel chipset do not invalidate the CPU cache from the +* GPU for a snooped address, leading to stale data being read by the CPU +* and incorrect results. Enabling this flag will prevent the driver from +* using snooped access, e.g. userptr. +*/ + bool has_snoop_bug; + /** * Some versions of Gen hardware don't do centroid interpolation correctly * on unlit pixels, causing incorrect values for derivatives near triangle -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] i965: Expose AMD_pinned_memory
All GEN GPU can bind to any piece of memory (thanks UMA), and so through a special ioctl we can map a chunk of page-aligned client memory into the GPU address space. However, not all GEN are equal. Some have cache-coherency between the CPU and the GPU, whilst the others are incoherent and rely on snooping on explicit flushes to push/pull dirty data. Whereas we can use client buffers as a general replacement for kernel allocated buffers with LLC (cache coherency), using snooped buffers behaves differently and so must be used with care. AMD_pinned_memory supposes that the client memory buffer is suitable for any general usage (e.g. vertex data, texture data) and so only on LLC can we offer that extension. --- .../drivers/dri/i965/intel_buffer_objects.c | 68 +-- .../drivers/dri/i965/intel_buffer_objects.h | 6 ++ src/mesa/drivers/dri/i965/intel_extensions.c | 11 +++ 3 files changed, 65 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 452e6d33c07..4b34b55793b 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -72,6 +72,23 @@ mark_buffer_invalid(struct intel_buffer_object *intel_obj) intel_obj->valid_data_end = 0; } +/** Allocates a new brw_bo to store the data for the buffer object. */ +static void +mark_new_state(struct brw_context *brw, + struct intel_buffer_object *intel_obj) +{ + /* the buffer might be bound as a uniform buffer, need to update it +*/ + if (intel_obj->Base.UsageHistory & USAGE_UNIFORM_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; + if (intel_obj->Base.UsageHistory & USAGE_SHADER_STORAGE_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; + if (intel_obj->Base.UsageHistory & USAGE_TEXTURE_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_TEXTURE_BUFFER; + if (intel_obj->Base.UsageHistory & USAGE_ATOMIC_COUNTER_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; +} + /** Allocates a new brw_bo to store the data for the buffer object. */ static void alloc_buffer_object(struct brw_context *brw, @@ -96,20 +113,28 @@ alloc_buffer_object(struct brw_context *brw, */ size += 64 * 32; /* max read length of 64 256-bit units */ } + + assert(!intel_obj->pinned); intel_obj->buffer = brw_bo_alloc(brw->bufmgr, "bufferobj", size, BRW_MEMZONE_OTHER); - /* the buffer might be bound as a uniform buffer, need to update it -*/ - if (intel_obj->Base.UsageHistory & USAGE_UNIFORM_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; - if (intel_obj->Base.UsageHistory & USAGE_SHADER_STORAGE_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; - if (intel_obj->Base.UsageHistory & USAGE_TEXTURE_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_TEXTURE_BUFFER; - if (intel_obj->Base.UsageHistory & USAGE_ATOMIC_COUNTER_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; + mark_new_state(brw, intel_obj); + mark_buffer_inactive(intel_obj); + mark_buffer_invalid(intel_obj); +} + +static void +alloc_userptr_object(struct brw_context *brw, + struct intel_buffer_object *intel_obj, + GLsizeiptrARB size, + const GLvoid *data) +{ + intel_obj->buffer = + brw_bo_alloc_userptr(brw->bufmgr, "bufferobj(userptr)", + (void *)data, size); + intel_obj->pinned = true; + mark_new_state(brw, intel_obj); mark_buffer_inactive(intel_obj); mark_buffer_invalid(intel_obj); } @@ -119,6 +144,7 @@ release_buffer(struct intel_buffer_object *intel_obj) { brw_bo_unreference(intel_obj->buffer); intel_obj->buffer = NULL; + intel_obj->pinned = false; } /** @@ -192,10 +218,6 @@ brw_buffer_data(struct gl_context *ctx, struct brw_context *brw = brw_context(ctx); struct intel_buffer_object *intel_obj = intel_buffer_object(obj); - /* Part of the ABI, but this function doesn't use it. -*/ - (void) target; - intel_obj->Base.Size = size; intel_obj->Base.Usage = usage; intel_obj->Base.StorageFlags = storageFlags; @@ -207,12 +229,16 @@ brw_buffer_data(struct gl_context *ctx, release_buffer(intel_obj); if (size != 0) { - alloc_buffer_object(brw, intel_obj); + if (target != GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD) + alloc_buffer_object(brw, intel_obj); + else + alloc_userptr_object(brw, intel_obj, size, data); if (!intel_obj->buffer) return false; if (data != NULL) { - brw_bo_subdata(intel_obj->buffer, 0, size, data); + if (target != GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD) +brw_bo_subdata(intel_obj->buffer, 0, size, data); mark_buffer_valid_data(intel_obj, 0, size); } } @@ -275,9 +301,10 @@ brw_buffer_subdata(struct gl_context *ctx,
[Mesa-dev] [PATCH 4/4] docs/relnotes: Add AMD_pinned_memory for i965
Technically only for Sandybridge and later core designs, but finally we can claim support for allowing clients to create glBufferObjects from their own memory. --- docs/relnotes/18.3.0.html | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/relnotes/18.3.0.html b/docs/relnotes/18.3.0.html index ac2cc1e893b..4ab9fb44736 100644 --- a/docs/relnotes/18.3.0.html +++ b/docs/relnotes/18.3.0.html @@ -53,6 +53,7 @@ Note: some of the new features are only available with certain drivers. GL_AMD_framebuffer_multisample_advanced on radeonsi. GL_EXT_window_rectangles on radeonsi. +GL_AMD_pinned_memory on i965 Bug fixes -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Reuse the same single-page bo for all zero sized allocations
Quoting Michal Srb (2018-08-15 09:22:19) > Hi, > > This is my first attempt to review patch for Mesa, so please take it with a > grain of salt. > > On úterý 14. srpna 2018 20:21:40 CEST Chris Wilson wrote: > > @@ -504,6 +506,24 @@ bo_alloc_internal(struct brw_bufmgr *bufmgr, > > bool busy = false; > > bool zeroed = false; > > > > + /* Reuse the same bo for all zero-sized requests */ > > + if (size == 0) { > > + if (bufmgr->bo_zero == NULL) { > > + bo = bo_alloc_internal(bufmgr, "zero", 4096, > > +BRW_MEMZONE_OTHER, BO_ALLOC_BUSY, 0, 0); > > + if (!bo) > > +return NULL; > > + > > + bo->size = 0; > > Doesn't this break something once the bo_zero is freed? Either after two > threads raced and one has to cleanup or when the whole bufmgr is destroyed. > > The bucket_for_size will choose different bucket for sizes 4096 and 0: > size 4096 -> bucket index 0 -> some bucket (I assume) > size0 -> bucket index 119 -> null bucket (always too big) > > So the get_bucket_allocator function will return some bucket when allocating > and null bucket when freeing, so allocation goes thru bucket_vma_alloc and > freeing thru util_vma_heap_free. I did not examine those functions closer, > but > it does sound like it would break something. If it doesn't fit into a bucket, it will be freed immediately. The only down side is that as it is part of a bucket_vma, it will pin that bucket_vma forever, but that is not such a big deal as it evaporates with the bufmgr. To make the no-bucket more explicit, bo->reusable = false. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Reuse the same single-page bo for all zero sized allocations
Similar in principle to how malloc(0) can return the same constant address knowing that the client is not allowed to access any of its bytes (as they do not exist!), we can return the same bo for all zero sized allocation requests. Having a single identifier should help track down the redundant allocations, while in the mean time making sure that we do not waste any extra pages on them. Signed-off-by: Chris Wilson Cc: Sergii Romantsov Cc: Lionel Landwerlin Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 09d45e30ecc..12d207b36d2 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -148,6 +148,8 @@ struct brw_bufmgr { int num_buckets; time_t time; + struct brw_bo *bo_zero; + struct hash_table *name_table; struct hash_table *handle_table; @@ -504,6 +506,24 @@ bo_alloc_internal(struct brw_bufmgr *bufmgr, bool busy = false; bool zeroed = false; + /* Reuse the same bo for all zero-sized requests */ + if (size == 0) { + if (bufmgr->bo_zero == NULL) { + bo = bo_alloc_internal(bufmgr, "zero", 4096, +BRW_MEMZONE_OTHER, BO_ALLOC_BUSY, 0, 0); + if (!bo) +return NULL; + + bo->size = 0; + + if (p_atomic_cmpxchg(&bufmgr->bo_zero, NULL, bo)) +brw_bo_unreference(bo); + } + + brw_bo_reference(bufmgr->bo_zero); + return bufmgr->bo_zero; + } + if (flags & BO_ALLOC_BUSY) busy = true; @@ -1179,6 +1199,8 @@ can_map_cpu(struct brw_bo *bo, unsigned flags) void * brw_bo_map(struct brw_context *brw, struct brw_bo *bo, unsigned flags) { + assert(bo->size); + if (bo->tiling_mode != I915_TILING_NONE && !(flags & MAP_RAW)) return brw_bo_map_gtt(brw, bo, flags); @@ -1297,6 +1319,8 @@ brw_bo_wait(struct brw_bo *bo, int64_t timeout_ns) void brw_bufmgr_destroy(struct brw_bufmgr *bufmgr) { + brw_bo_unreference(bufmgr->bo_zero); + mtx_destroy(&bufmgr->lock); /* Free any cached buffer objects we were going to reuse */ -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel: tools: aubwrite: split gen[89] from gen10+
Quoting Lionel Landwerlin (2018-07-30 17:08:47) > On 30/07/18 16:45, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2018-07-30 16:28:37) > >> Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end > >> of the context image. > > Hmm, do you think we should perhaps include the BBE in the protocontext > > we create in the kernel? > > -Chris > > > > I can't even see where the BBE is in the i915 code base :( There isn't one, it should be in execlists_init_reg_state() if required in the initial context image, and even if not strictly required, for completeness. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel: tools: aubwrite: split gen[89] from gen10+
Quoting Lionel Landwerlin (2018-07-30 16:28:37) > Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end > of the context image. Hmm, do you think we should perhaps include the BBE in the protocontext we create in the kernel? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] intel/ppgtt: memory address alignment
Quoting Sergii Romantsov (2018-07-25 10:37:29) > Hello, Chris. > Your variant also works. > But i wonder about comment: > /* If we don't have caching at this size, don't actually round the > * allocation up. > */ > if (bucket == NULL) { > > Has it any sense now? If 'no' - will delete it in next patch update. It's actually talking about rounding up to bucket size which started off as next power-of-two, since reduced to some fractions and even now the rounding is debatable. The page size allocation is a property of the uABI -- objects are allocated in pages. Now there is no reason the bufmgr can't do sub-page allocations, it would take a bit of work to factor out the offset_in_page() later but doable. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] intel/ppgtt: memory address alignment
Quoting Sergii Romantsov (2018-07-25 08:42:55) > Hello, > here is a backtrace: ... Please try: diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 09d45e30ecc..8274c2e0b2f 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -496,7 +496,6 @@ bo_alloc_internal(struct brw_bufmgr *bufmgr, uint32_t stride) { struct brw_bo *bo; - unsigned int page_size = getpagesize(); int ret; struct bo_cache_bucket *bucket; bool alloc_from_cache; @@ -522,12 +521,12 @@ bo_alloc_internal(struct brw_bufmgr *bufmgr, * allocation up. */ if (bucket == NULL) { - bo_size = size; - if (bo_size < page_size) - bo_size = page_size; + unsigned int page_size = getpagesize(); + bo_size = ALIGN(size, page_size); } else { bo_size = bucket->size; } + assert(bo_size); mtx_lock(&bufmgr->lock); /* Get a buffer out of the cache if available */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] intel/ppgtt: memory address alignment
Quoting Lionel Landwerlin (2018-07-24 13:45:18) > On 24/07/18 13:42, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2018-07-24 13:34:57) > >> That looks correct to me (and we do the same in Anv). > >> Also a bit baffled that we haven't run into issues earlier :( > > All the allocations should be in multiples of page size, alignment less > > than a page size should be a no-op. Tracking down who doesn't think > > IS_ALIGNED(bo->size, PAGE_SIZE) would be interesting. > > -Chris > > > Buckets? The bucket size is still a multiple of PAGE_SIZE, and gtt_offsets within a bucket are computed as an offset from the base of the bucket (i.e. vma_alloc() is only called once for the entire bucket, and it acts as a cache to reduce the number of vma operations.) -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] intel/ppgtt: memory address alignment
Quoting Lionel Landwerlin (2018-07-24 13:34:57) > That looks correct to me (and we do the same in Anv). > Also a bit baffled that we haven't run into issues earlier :( All the allocations should be in multiples of page size, alignment less than a page size should be a no-op. Tracking down who doesn't think IS_ALIGNED(bo->size, PAGE_SIZE) would be interesting. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()
Quoting Nanley Chery (2018-07-23 18:17:15) > Satisfy the BLT engine's row pitch limitation on the destination > miptree. The destination miptree is untiled, so its row_pitch will be > slightly less than or equal to the source miptree's row_pitch. Use the > source miptree's row_pitch in can_blit_slice instead of its blt_pitch. > > Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3 > ("i965/miptree: Use the correct BLT pitch") > > Cc: > Reported-by: Dylan Baker > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index a18d5ac3624..d8e823e4826 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -3544,8 +3544,13 @@ static bool > can_blit_slice(struct intel_mipmap_tree *mt, > unsigned int level, unsigned int slice) > { > - /* See intel_miptree_blit() for details on the 32k pitch limit. */ > - if (intel_miptree_blt_pitch(mt) >= 32768) > + /* The blit destination is untiled, so its row_pitch will be slightly less > +* than or equal to the source's row_pitch. The BLT engine only supports > +* linear row pitches up to but not including 32k. > +* > +* See intel_miptree_blit() for details on the 32k pitch limit. > +*/ > + if (mt->surf.row_pitch >= 32768) >return false; I see the difference, but do we copy the whole slice or a region of it? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Context aware user space EU control through application
Quoting aravindan.muthuku...@intel.com (2018-07-20 09:32:57) > From: "Muthukumar, Aravindan" > > The Patch here is to give control to user/ application to really > decide what's the max GPU load it would put. If that can be > known in advance, rpcs can be programmed accordingly. > This solution has changes across i915, > drm and mesa (not limited only to kernel). > > Here, we pass gpu_load_type = {high, medium, low} from application > while context is created. Default here is 'High' and applications > roughly know if they are going to eat up entire GPU. The typical > usecase of 'Low' is idle screen or minor mouse movements. Users can > read meaning of high/medium/low for their platform & then program > contexts accordingly. Here gpu_load_type directly translates to > number of shader cores/EUs a particular GPU has. > > Signed-off-by: Aravindan Muthukumar > Signed-off-by: Kedar J Karanje > Signed-off-by: Praveen Diwakar > Signed-off-by: Yogesh Marathe > +/* Dynamic Eu control */ > +struct drm_i915_load_type { > + __u32 ctx_id; > + __u32 load_type; > +}; > + > +/* DYNAMIC EU CONTROL */ > +int > +brw_hw_context_load_type(struct brw_bufmgr *bufmgr, > +uint32_t ctx_id, > +int load_type) > +{ > + struct drm_i915_load_type type = { > + .ctx_id = ctx_id, > + .load_type = load_type, > + }; > + int err; > + > + err = 0; > + if(drmIoctl(bufmgr->fd, DRM_IOCTL_I915_LOAD_TYPE, &type)) > + err = -errno; This went through 4 people and none noticed that there already exists a means to set per-context parameters. And it's even used right next to this function. The word hint needs to be firmly embedded around here. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/miptree: Drop an if case from retile_as_linear
Quoting Nanley Chery (2018-07-12 18:28:15) > Drop an if statement whose predicate never evaluates to true. row_pitch > belongs to a surface with non-linear tiling. According to > isl_calc_tiled_min_row_pitch, the pitch is a multiple of the tile width. > By looking at isl_tiling_get_info, we see that non-linear tilings have > widths greater than or equal to 128B. Yup, we only have non-linear at this point and pitch has to a multiple of tiles. > Cc: Reviewed-by: Chris Wilson -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/miptree: Use the correct BLT pitch
Quoting Nanley Chery (2018-07-12 18:28:16) > Retile miptrees to a linear tiling less often. Retiling can cause issues > with imported BOs. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106738 > Suggested-by: Chris Wilson > Cc: Reviewed-by: Chris Wilson -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965: Make blt_pitch public
Quoting Nanley Chery (2018-07-12 18:28:14) > We'd like to reuse this helper. > > Cc: Reviewed-by: Chris Wilson -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 05/16] util: rb-tree: A simple, invasive, red-black tree
Quoting Lionel Landwerlin (2018-06-21 17:29:04) > From: Jason Ekstrand > > This is a simple, invasive, liberally licensed red-black tree > implementation. It's an invasive data structure similar to the > Linux kernel linked-list where the intention is that you embed a s/linked-list/rbtree/ Might as well compare like for like. > rb_node struct the data structure you intend to put into the > tree. > > The implementation is mostly based on the one in "Introduction to > Algorithms", third edition, by Cormen, Leiserson, Rivest, and > Stein. There were a few other key design points: > > * It's an invasive data structure similar to the [Linux kernel >linked list]. > > * It uses NULL for leaves instead of a sentinel. This means a few >algorithms differ a small bit from the ones in "Introduction to >Algorithms". > > * All search operations are inlined so that the compiler can >optimize away the function pointer call. > --- > src/util/Makefile.sources | 2 + > src/util/meson.build | 2 + > src/util/rb_tree.c| 421 ++ > src/util/rb_tree.h| 269 No tester? Insert/remove 1,000,000 u32 and check the post-order is sorted and has correct coloring? (I'm stealing ideas from kernel/lib/rbtree_test.c fwiw.) -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] drm/i915/gtt: Enable full-ppgtt by default everywhere
Quoting Chris Wilson (2018-06-18 11:10:23) > We should we have all the kinks worked out and full-ppgtt now works > reliably on gen7 (Ivybridge, Valleyview/Baytrail and Haswell). If we can > let userspace have full control over their own ppgtt, it makes softpinning > far more effective, in turn making GPU dispatch far more efficient and > more secure (due to better mm segregation). On the other hand, switching > over to a different GTT for every client does incur noticeable overhead. > > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Mika Kuoppala > Cc: Matthew Auld > Reviewed-by: Joonas Lahtinen > Cc: Jason Ekstrand > Cc: Kenneth Graunke > --- > > This has been run through piglit for ivb/vlv/hsw locally and hsw on > kernel's CI, but we would like at least one ack from Mesa as well. If > it's possible to run it through the full gamut of your testing, that > would be great. The patches applies to drm-tip, and requires fixes in drm-tip for at least one GPU hang full-ppgtt caused in piglit. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] drm/i915/gtt: Full ppgtt everywhere, no excuses
We believe we have all the kinks worked out, even for the early Valleyview devices, for whom we currently disable all ppgtt. References: 62942ed7279d ("drm/i915/vlv: disable PPGTT on early revs v3") Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Joonas Lahtinen Reviewed-by: Joonas Lahtinen Acked-by: Ville Syrjälä --- drivers/gpu/drm/i915/i915_gem_gtt.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 5ef5176e10fe..dbc55ae234ac 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -173,12 +173,6 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv, return 0; } - /* Early VLV doesn't have this */ - if (IS_VALLEYVIEW(dev_priv) && dev_priv->drm.pdev->revision < 0xb) { - DRM_DEBUG_DRIVER("disabling PPGTT on pre-B3 step VLV\n"); - return 0; - } - if (has_full_48bit_ppgtt) return 3; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] drm/i915/gtt: Enable full-ppgtt by default everywhere
We should we have all the kinks worked out and full-ppgtt now works reliably on gen7 (Ivybridge, Valleyview/Baytrail and Haswell). If we can let userspace have full control over their own ppgtt, it makes softpinning far more effective, in turn making GPU dispatch far more efficient and more secure (due to better mm segregation). On the other hand, switching over to a different GTT for every client does incur noticeable overhead. Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Mika Kuoppala Cc: Matthew Auld Reviewed-by: Joonas Lahtinen Cc: Jason Ekstrand Cc: Kenneth Graunke --- This has been run through piglit for ivb/vlv/hsw locally and hsw on kernel's CI, but we would like at least one ack from Mesa as well. If it's possible to run it through the full gamut of your testing, that would be great. -Chris --- drivers/gpu/drm/i915/i915_gem_gtt.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index c6aa761ca085..5ef5176e10fe 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -179,13 +179,11 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv, return 0; } - if (HAS_LOGICAL_RING_CONTEXTS(dev_priv)) { - if (has_full_48bit_ppgtt) - return 3; + if (has_full_48bit_ppgtt) + return 3; - if (has_full_ppgtt) - return 2; - } + if (has_full_ppgtt) + return 2; return 1; } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [RFC] i965/blit: bump some limits to 64k
Quoting Nanley Chery (2018-06-14 19:46:09) > On Thu, Jun 14, 2018 at 10:01:18AM -0700, Nanley Chery wrote: > > On Thu, Jun 14, 2018 at 04:18:30PM +0300, Martin Peres wrote: > > > This fixes screenshots using 8k+ wide display setups in modesetting. > > > > > > Chris Wilson even recommended the changes in intel_mipmap_tree.c > > > should read 131072 instead of 65535, but I for sure got confused by > > > his explanation. > > > > > > In any case, I would like to use this RFC patch as a forum to discuss > > > why the fallback path is broken[1], and as to what should be the > > > limits for HW-accelerated blits now that we got rid of the blitter > > > usage on recent platforms. > > > > > > > Hi, > > > > My understanding is that the fallback path is broken because we silently > > ignore miptree_create_for_bo's request for a tiled miptree. This results > > in some parts of mesa treating the surface as tiled and other parts of > > treating the surface as linear. > > > > I couldn't come up with a piglit test for this when I was working on a > > fix. Please let me know if you can think of any. > > > > I think what the limits should be depends on which mesa branch you're > > working off of. > > > > * On the master branch of mesa, which has some commits which reduce the > > dependence on the BLT engine, we can remove these limits by using BLORP. > > As much as I can tell, BLORP can handle images as wide as the surface > > pitch limit in the RENDER_SURFACE_STATE packet will allow. > > > > I sent out a series [a] a couple weeks ago that removes the limits > > imposed by the hardware blitter. > > > > * On the stable branch however, we can modify some incorrect code to set > > the correct BLT limits (as Chris has suggested). The BLT engine's pitch > > field is a signed 16bit integer, whose unit changes depending on the > > tiling of the surface. For linear surfaces, it's in units of bytes and > > for non-linear surfaces, it's in units of dwords. This translates to > > 2^15-1 bytes or (2^15-1) * 4 bytes respectively. > > > > I made a branch [b] which does this already, but I think my rebasing + > > testing strategy for stable branches on the CI might be incorrect. > > I just rebased this branch onto master and tested it on the CI. > Everything passes except for SNB. I get 1 GPU hang and two test > failures: > * failure-gpu-hang-otc-gfxtest-snbgt1-01-snbm64.compile.error > * KHR-GL33.shaders.uniform_block.random.all_shared_buffer.3.snbm64 > * dEQP-EGL.functional.color_clears.multi_context.gles3.rgba_pbuffer.snbm64 What are the command lines? Assuming piglit, which the last one doesn't appear to be, I can try, see what happens, and see if I can be of assistance. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] intel: aubinator: fix decoding with softpin
Quoting Lionel Landwerlin (2018-06-10 13:15:10) > Now that we're softpinning the address of our BOs in anv & i965, the > addresses selected start at the top of the addressing space. This is a > problem for the current implementation of aubinator which uses only a > 40bit mmapped address space. > > This change keeps track of all the memory writes from the aub file and > fetch them on request by the batch decoder. As a result we can get rid > of the 1<<40 mmapped address space and only rely on the mmap aub file > \o/ > > Signed-off-by: Lionel Landwerlin > --- > src/intel/tools/aubinator.c | 116 +--- > 1 file changed, 68 insertions(+), 48 deletions(-) > > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c > index 3120e82b22e..4a60a606f63 100644 > --- a/src/intel/tools/aubinator.c > +++ b/src/intel/tools/aubinator.c > @@ -68,8 +68,14 @@ char *input_file = NULL, *xml_path = NULL; > struct gen_device_info devinfo; > struct gen_batch_decode_ctx batch_ctx; > > -uint64_t gtt_size, gtt_end; > -void *gtt; > +static struct memory_chunk { > + void *map; > + uint64_t address; > + uint32_t size; Why is size only 32b? I didn't see where large bo would be split into multiple chunks. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] docs/relnotes: Add AMD_pinned_memory for i965
Technically only for Sandybridge and later core designs, but finally we can claim support for allowing clients to create glBufferObjects from their own memory. --- docs/relnotes/18.2.0.html | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html index d377a91491c..bc32aaa8bb3 100644 --- a/docs/relnotes/18.2.0.html +++ b/docs/relnotes/18.2.0.html @@ -52,6 +52,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_fragment_shader_interlock on i965 +GL_AMD_pinned_memory on i965 Bug fixes -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] i965: Expose AMD_pinned_memory
All GEN GPU can bind to any piece of memory (thanks UMA), and so through a special ioctl we can map a chunk of page-aligned client memory into the GPU address space. However, not all GEN are equal. Some have cache-coherency between the CPU and the GPU, whilst the others are incoherent and rely on snooping on explicit flushes to push/pull dirty data. Whereas we can use client buffers as a general replacement for kernel allocated buffers with LLC (cache coherency), using snooped buffers behaves differently and so must be used with care. AMD_pinned_memory supposes that the client memory buffer is suitable for any general usage (e.g. vertex data, texture data) and so only on LLC can we offer that extension. --- .../drivers/dri/i965/intel_buffer_objects.c | 68 +-- .../drivers/dri/i965/intel_buffer_objects.h | 6 ++ src/mesa/drivers/dri/i965/intel_extensions.c | 11 +++ 3 files changed, 65 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c index 452e6d33c07..4b34b55793b 100644 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c @@ -72,6 +72,23 @@ mark_buffer_invalid(struct intel_buffer_object *intel_obj) intel_obj->valid_data_end = 0; } +/** Allocates a new brw_bo to store the data for the buffer object. */ +static void +mark_new_state(struct brw_context *brw, + struct intel_buffer_object *intel_obj) +{ + /* the buffer might be bound as a uniform buffer, need to update it +*/ + if (intel_obj->Base.UsageHistory & USAGE_UNIFORM_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; + if (intel_obj->Base.UsageHistory & USAGE_SHADER_STORAGE_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; + if (intel_obj->Base.UsageHistory & USAGE_TEXTURE_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_TEXTURE_BUFFER; + if (intel_obj->Base.UsageHistory & USAGE_ATOMIC_COUNTER_BUFFER) + brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; +} + /** Allocates a new brw_bo to store the data for the buffer object. */ static void alloc_buffer_object(struct brw_context *brw, @@ -96,20 +113,28 @@ alloc_buffer_object(struct brw_context *brw, */ size += 64 * 32; /* max read length of 64 256-bit units */ } + + assert(!intel_obj->pinned); intel_obj->buffer = brw_bo_alloc(brw->bufmgr, "bufferobj", size, BRW_MEMZONE_OTHER); - /* the buffer might be bound as a uniform buffer, need to update it -*/ - if (intel_obj->Base.UsageHistory & USAGE_UNIFORM_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; - if (intel_obj->Base.UsageHistory & USAGE_SHADER_STORAGE_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; - if (intel_obj->Base.UsageHistory & USAGE_TEXTURE_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_TEXTURE_BUFFER; - if (intel_obj->Base.UsageHistory & USAGE_ATOMIC_COUNTER_BUFFER) - brw->ctx.NewDriverState |= BRW_NEW_UNIFORM_BUFFER; + mark_new_state(brw, intel_obj); + mark_buffer_inactive(intel_obj); + mark_buffer_invalid(intel_obj); +} + +static void +alloc_userptr_object(struct brw_context *brw, + struct intel_buffer_object *intel_obj, + GLsizeiptrARB size, + const GLvoid *data) +{ + intel_obj->buffer = + brw_bo_alloc_userptr(brw->bufmgr, "bufferobj(userptr)", + (void *)data, size); + intel_obj->pinned = true; + mark_new_state(brw, intel_obj); mark_buffer_inactive(intel_obj); mark_buffer_invalid(intel_obj); } @@ -119,6 +144,7 @@ release_buffer(struct intel_buffer_object *intel_obj) { brw_bo_unreference(intel_obj->buffer); intel_obj->buffer = NULL; + intel_obj->pinned = false; } /** @@ -192,10 +218,6 @@ brw_buffer_data(struct gl_context *ctx, struct brw_context *brw = brw_context(ctx); struct intel_buffer_object *intel_obj = intel_buffer_object(obj); - /* Part of the ABI, but this function doesn't use it. -*/ - (void) target; - intel_obj->Base.Size = size; intel_obj->Base.Usage = usage; intel_obj->Base.StorageFlags = storageFlags; @@ -207,12 +229,16 @@ brw_buffer_data(struct gl_context *ctx, release_buffer(intel_obj); if (size != 0) { - alloc_buffer_object(brw, intel_obj); + if (target != GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD) + alloc_buffer_object(brw, intel_obj); + else + alloc_userptr_object(brw, intel_obj, size, data); if (!intel_obj->buffer) return false; if (data != NULL) { - brw_bo_subdata(intel_obj->buffer, 0, size, data); + if (target != GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD) +brw_bo_subdata(intel_obj->buffer, 0, size, data); mark_buffer_valid_data(intel_obj, 0, size); } } @@ -275,9 +301,10 @@ brw_buffer_subdata(struct gl_context *ctx,
[Mesa-dev] [PATCH 3/5] i965: Use blorp+userptr for GPU readback
The primary benefit for this is that we get format conversion for "free", along with detiling and cache flushing (most relevant for !llc). Using the GPU does impose a bandwidth cost that is presumably better used for rendering, hence we limit the use to readback into client memory (not pbo) where we would need to stall on the GPU anyway. (Uploads remain direct/staged to avoid the synchronisation cost.) And we only use the GPU path if a direct read into client memory from video memory is unavailable. The ultimate user of this is Xorg/glamor! On byt, bsw, bxt (and presumably but not measured ilk), x11perf -shmget500 is improved by about 5-fold, but only equivalent to enabling movntqda, which is the preferred path. Though conversely the overhead of executing and waiting upon an additional blorp batch is shown by x11perf -shmget10 being reduced by a factor of 2. I think it is fair to presume that large copies will dominate (and that the overhead of a single batch is something that we can iteratively reduce, for the benefit of all.) llc machines continue to use direct access where there is no format changes (which one hopes is the typical use case). Opens: - Is blorp missing a resolve? glx/glx-copy-sub-buffer samples=16: pass fail glx/glx-copy-sub-buffer samples=2: pass fail glx/glx-copy-sub-buffer samples=4: pass fail glx/glx-copy-sub-buffer samples=6: pass fail glx/glx-copy-sub-buffer samples=8: pass fail spec/!opengl 1.1/read-front clear-front-first samples=16: pass fail spec/!opengl 1.1/read-front clear-front-first samples=2: pass fail spec/!opengl 1.1/read-front clear-front-first samples=4: pass fail spec/!opengl 1.1/read-front clear-front-first samples=6: pass fail spec/!opengl 1.1/read-front clear-front-first samples=8: pass fail spec/!opengl 1.1/read-front samples=16: pass fail spec/!opengl 1.1/read-front samples=2: pass fail spec/!opengl 1.1/read-front samples=4: pass fail spec/!opengl 1.1/read-front samples=6: pass fail spec/!opengl 1.1/read-front samples=8: pass fail Cc: Jason Ekstrand Cc: Topi Pohjolainen Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_blorp.c| 74 ++-- src/mesa/drivers/dri/i965/intel_pixel_read.c | 21 +++--- src/mesa/drivers/dri/i965/intel_tex_image.c | 27 --- 3 files changed, 73 insertions(+), 49 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 8c6d77e1b7d..1e65217247b 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -36,6 +36,7 @@ #include "brw_defines.h" #include "brw_meta_util.h" #include "brw_state.h" +#include "intel_batchbuffer.h" #include "intel_buffer_objects.h" #include "intel_fbo.h" #include "common/gen_debug.h" @@ -769,8 +770,11 @@ blorp_get_client_bo(struct brw_context *brw, GLenum target, GLenum format, GLenum type, const void *pixels, const struct gl_pixelstore_attrib *packing, -uint32_t *offset_out, uint32_t *row_stride_out, -uint32_t *image_stride_out, bool read_only) +bool read_only, +uint32_t *offset_out, +uint32_t *row_stride_out, +uint32_t *image_stride_out, +bool *sync_out) { /* Account for SKIP_PIXELS, SKIP_ROWS, ALIGNMENT, and SKIP_IMAGES */ const GLuint dims = _mesa_get_texture_dimensions(target); @@ -785,14 +789,15 @@ blorp_get_client_bo(struct brw_context *brw, *row_stride_out = stride; *image_stride_out = _mesa_image_image_stride(packing, w, h, format, type); + *sync_out = false; - if (_mesa_is_bufferobj(packing->BufferObj)) { - const uint32_t offset = first_pixel + (intptr_t)pixels; - if (!read_only && ((offset % cpp) || (stride % cpp))) { - perf_debug("Bad PBO alignment; fallback to CPU mapping\n"); - return NULL; - } + const uintptr_t offset = first_pixel + (uintptr_t)pixels; + if (!read_only && (offset % cpp || stride % cpp)) { + perf_debug("Bad PBO alignment; fallback to CPU mapping\n"); + return NULL; + } + if (_mesa_is_bufferobj(packing->BufferObj)) { /* This is a user-provided PBO. We just need to get the BO out */ struct intel_buffer_object *intel_pbo = intel_buffer_object(packing->BufferObj); @@ -807,13 +812,10 @@ blorp_get_client_bo(struct brw_context *brw, *offset_out = offset; return bo; - } else { + } else if (read_only) { /* Someone should have already checked that there is data to upload. */ assert(pixels); - /* Creating a temp buffer currently only works for upload */ - assert(read_only); - /* This is not a user-provided PBO. Instead, pixels is a pointer to CPU * data which we need to copy into a BO. */ @@ -826,7 +828,7 @@ blorp_get_client_bo(struct brw_context *brw, return NULL; } - if (b
[Mesa-dev] [PATCH 2/5] i965: Allow creation of brw_bo from system memory (userptr)
Since v3.16 (though universal access was only enabled by default in v4.6), the kernel has offered the ability to wrap any system memory (i.e. RAM and not I/O mapped memory) into an object that can be used by the GPU. The caveat is that this object is marked as cache coherent (so that the client can continue accessing the memory blissfully ignorant of the synchronisation required with the GPU) and on !llc platforms this means that the object is snooped. Snooping imposes a large performance penalty and is only advised to be used for one-off transfers. However, it provides another useful tool in the driver toolbox. --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 70 +++- src/mesa/drivers/dri/i965/brw_bufmgr.h | 8 +++ src/mesa/drivers/dri/i965/intel_screen.c | 20 +++ src/mesa/drivers/dri/i965/intel_screen.h | 1 + 4 files changed, 98 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 7ac3bcad3da..4132810324a 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -724,6 +724,74 @@ brw_bo_alloc_tiled_2d(struct brw_bufmgr *bufmgr, const char *name, memzone, tiling, stride); } +/* + * Wrap the chunk of client memory given by ptr+size inside a GPU + * buffer, and make it cache coherent (though on non-LLC architectures + * this requires snooping on explicit cache flushes). This allows the + * caller to write into the memory chunk and for those writes to be + * visible on the GPU (exactly as if they create the buffer and then + * persistently mapped it to obtain the pointer). + */ +struct brw_bo * +brw_bo_alloc_userptr(struct brw_bufmgr *bufmgr, + const char *name, void *ptr, uint64_t size) +{ + struct brw_bo *bo = calloc(1, sizeof(*bo)); + if (!bo) + return NULL; + + bo->bufmgr = bufmgr; + bo->name = name; + p_atomic_set(&bo->refcount, 1); + + bo->size = size; + bo->map_cpu = ptr; + bo->userptr = true; + bo->reusable = false; + bo->cache_coherent = true; + bo->idle = true; + bo->kflags = bufmgr->initial_kflags; + + bo->tiling_mode = I915_TILING_NONE; + bo->swizzle_mode = I915_BIT_6_SWIZZLE_NONE; + bo->stride = 0; + + struct drm_i915_gem_userptr arg = { + .user_ptr = (uintptr_t)ptr, + .user_size = size, + .flags = 0, + }; + if (drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_USERPTR, &arg)) { + free(bo); + return NULL; + } + + bo->gem_handle = arg.handle; + + /* Check the buffer for validity before we try and use it in a batch */ + struct drm_i915_gem_set_domain sd = { + .handle = bo->gem_handle, + .read_domains = I915_GEM_DOMAIN_CPU, + }; + if (drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &sd)) + goto err_free; + + if (brw_using_softpin(bufmgr)) { + bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, size, 1); + if (bo->gtt_offset == 0ull) + goto err_free; + } + + VG_DEFINED(ptr, size); /* Presume we write to it using the GPU */ + return bo; + +err_free: + mtx_lock(&bufmgr->lock); + bo_free(bo); + mtx_unlock(&bufmgr->lock); + return NULL; +} + /** * Returns a brw_bo wrapping the given buffer object handle. * @@ -814,7 +882,7 @@ bo_free(struct brw_bo *bo) { struct brw_bufmgr *bufmgr = bo->bufmgr; - if (bo->map_cpu) { + if (bo->map_cpu && !bo->userptr) { VG_NOACCESS(bo->map_cpu, bo->size); drm_munmap(bo->map_cpu, bo->size); } diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 8cdf944e3da..586afa2f333 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -191,6 +191,11 @@ struct brw_bo { * Boolean of whether this buffer is cache coherent */ bool cache_coherent; + + /** +* Boolean of whether this buffer is a userptr +*/ + bool userptr:1; }; #define BO_ALLOC_BUSY (1<<0) @@ -225,6 +230,9 @@ struct brw_bo *brw_bo_alloc_tiled(struct brw_bufmgr *bufmgr, uint32_t pitch, unsigned flags); +struct brw_bo *brw_bo_alloc_userptr(struct brw_bufmgr *bufmgr, +const char *name, void *ptr, uint64_t size); + /** * Allocate a tiled buffer object. * diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index cb357419a77..5359c05f94f 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1889,6 +1889,22 @@ intel_detect_swizzling(struct intel_screen *screen) return swizzle_mode != I915_BIT_6_SWIZZLE_NONE; } +static bool +intel_detect_userptr(struct intel_screen *screen) +{ + struct drm_i915_gem_userptr arg = { + .user_ptr = -4096ULL, + .user_size = 8192, /* invalid 64b wrap around */ + }; + + if (screen->devinfo.has_snoop_bu
[Mesa-dev] [PATCH 1/5] intel: Mark i965g/i965gm as having buggy snoop access
Recent kernels do exclude snoop access for i965g/i965gm as it does not work as advertised. However to avoid depending on a recent kernel for old hardware, mark the presence of the bug in gen_device_info. See kernel commit df0700e53047662c167836bd6fdeea55d5d8dcfa Author: Chris Wilson Date: Wed Sep 6 20:24:24 2017 +0100 drm/i915: Disable snooping (userptr, set-cache-level) on gen4 --- src/intel/dev/gen_device_info.c | 1 + src/intel/dev/gen_device_info.h | 8 2 files changed, 9 insertions(+) diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c index 8e971329892..7b57a9680af 100644 --- a/src/intel/dev/gen_device_info.c +++ b/src/intel/dev/gen_device_info.c @@ -93,6 +93,7 @@ gen_get_pci_device_id_override(void) static const struct gen_device_info gen_device_info_i965 = { .gen = 4, + .has_snoop_bug = true, .has_negative_rhw_bug = true, .num_slices = 1, .num_subslices = { 1, }, diff --git a/src/intel/dev/gen_device_info.h b/src/intel/dev/gen_device_info.h index 40b72383420..1aa3ea179ad 100644 --- a/src/intel/dev/gen_device_info.h +++ b/src/intel/dev/gen_device_info.h @@ -80,6 +80,14 @@ struct gen_device_info */ bool has_negative_rhw_bug; + /** +* Some specific Intel chipset do not invalidate the CPU cache from the +* GPU for a snooped address, leading to stale data being read by the CPU +* and incorrect results. Enabling this flag will prevent the driver from +* using snooped access, e.g. userptr. +*/ + bool has_snoop_bug; + /** * Some versions of Gen hardware don't do centroid interpolation correctly * on unlit pixels, causing incorrect values for derivatives near triangle -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix batch-last mode to properly swap BOs.
Quoting Kenneth Graunke (2018-06-04 11:18:37) > On pre-4.13 kernels, which don't support I915_EXEC_BATCH_FIRST, we move > the validation list entry to the end...but incorrectly left the exec_bo > array alone, causing a mismatch where exec_bos[0] no longer corresponded > with validation_list[0] (and similarly for the last entry). > > One example of resulting breakage is that we'd update bo->gtt_offset > based on the wrong buffer. This wreaked total havoc when trying to use > softpin, and likely caused unnecessary relocations in the normal case. > > Fixes: 29ba502a4e28471f67e4e904ae503157087efd20 (i965: Use > I915_EXEC_BATCH_FIRST when available.) Reviewed-by: Chris Wilson One thing that may have helped is if we do the post-execbuf processing in submit_batch; execbuffer() then just becomes stuffing the pointers into struct drm_i915_gem_execbuffer2 and calling the ioctl. At the moment, it isn't clear where batch->exec_bos was being used again, as one expects it to be consumed by submit_batch(). -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965: Split merge_inputs and clear_buffers.
Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:27) > From: Mathias Fröhlich > > The merge_inputs function handles that part that changes when the > inputs change. The clear_buffers function triggers when we may need > a new upload. Thus the merge_inputs can be limited to be once > per brw_draw_prims. > > Signed-off-by: Mathias Fröhlich > --- > src/mesa/drivers/dri/i965/brw_draw.c | 30 ++ > 1 file changed, 22 insertions(+), 8 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_draw.c > b/src/mesa/drivers/dri/i965/brw_draw.c > index ae3b7be2dd..2a7562a684 100644 > --- a/src/mesa/drivers/dri/i965/brw_draw.c > +++ b/src/mesa/drivers/dri/i965/brw_draw.c > @@ -278,21 +278,34 @@ brw_emit_prim(struct brw_context *brw, > > > static void > -brw_merge_inputs(struct brw_context *brw) > +brw_clear_buffers(struct brw_context *brw) > { > - const struct gen_device_info *devinfo = &brw->screen->devinfo; > - const struct gl_context *ctx = &brw->ctx; > - GLuint i; > - > - for (i = 0; i < brw->vb.nr_buffers; i++) { > + for (unsigned i = 0; i < brw->vb.nr_buffers; ++i) { >brw_bo_unreference(brw->vb.buffers[i].bo); >brw->vb.buffers[i].bo = NULL; > } > brw->vb.nr_buffers = 0; > > + for (unsigned i = 0; i < brw->vb.nr_enabled; ++i) { > + brw->vb.enabled[i]->buffer = -1; > + } Hmm, can we have an enabled buffer that is not in the set of brw->vb.buffers[]? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.
Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:26) > From: Mathias Fröhlich > > Avoid looping over all VARYING_SLOT_MAX urb_setup array > entries from genX_upload_sbe. Prepare an array indirection > to the active entries of urb_setup already in the compile > step. On upload only walk the active arrays. > > Signed-off-by: Mathias Fröhlich > --- > src/intel/compiler/brw_compiler.h | 7 +++ > src/intel/compiler/brw_fs.cpp | 23 +++ > src/intel/compiler/brw_fs.h | 2 ++ > src/intel/compiler/brw_fs_visitor.cpp | 1 + > src/mesa/drivers/dri/i965/genX_state_upload.c | 7 +++ > 5 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/src/intel/compiler/brw_compiler.h > b/src/intel/compiler/brw_compiler.h > index 8b4e6fe2e2..a9df45e00d 100644 > --- a/src/intel/compiler/brw_compiler.h > +++ b/src/intel/compiler/brw_compiler.h > @@ -743,6 +743,13 @@ struct brw_wm_prog_data { > * For varying slots that are not used by the FS, the value is -1. > */ > int urb_setup[VARYING_SLOT_MAX]; > + /** > +* Cache structure into the urb_setup array above that contains the > +* attribute numbers of active varyings out of urb_setup. > +* The actual count is stored in urb_setup_attribs_count. > +*/ > + int urb_setup_attribs[VARYING_SLOT_MAX]; > + int urb_setup_attribs_count; > }; > > struct brw_push_const_block { > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index b21996c168..6930414067 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup() > this->first_non_payload_grf = payload.num_regs + > prog_data->curb_read_length; > } > > +/* > + * Build up an array of indices into the urb_setup array that > + * references the active entries of the urb_setup array. > + * Used to accelerate walking the active entries of the urb_setup array > + * on each upload. > + */ > +void > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data) > +{ > + int index = 0; > + for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) { > + int input_index = wm_prog_data->urb_setup[attr]; > + if (input_index < 0) > + continue; > + wm_prog_data->urb_setup_attribs[index++] = attr; So far the only consumer of this wants the input_index again. Does that change, or is it worth including both to avoid the trawl? Is uint8_t (with a STATIC_ASSERT) good enough? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/miptree: Stop retiling miptrees
Quoting Nanley Chery (2018-05-30 21:44:35) > We previously retiled miptrees to work around limitations of the BLT > engine. BLORP fallbacks can overcome these, so we no longer have need > for retiling. > > Removing retiling fixes a number of problems. If the row pitch was too > wide for the BLT engine, we retiled to linear and had the following > issues: > * We retiled on gen6+ platforms which don't actually use the blitter. > * We ignored miptree_create_for_bo's requests for tiled miptrees. > > I don't know how to write a test for the last issue unfortunately. Also, > I haven't nominated this for stable releases, because of the amount of > churn needed - we'd have to pull in the series which stops using the > blitter on gen6+. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106738 > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 +-- > 1 file changed, 1 insertion(+), 61 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index 1c888d5210b..a57720b338a 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -509,46 +509,6 @@ free_aux_state_map(enum isl_aux_state **state) > free(state); > } > > -static bool > -need_to_retile_as_linear(struct brw_context *brw, unsigned row_pitch, > - enum isl_tiling tiling, unsigned samples) > -{ > - if (samples > 1) > - return false; > - > - if (tiling == ISL_TILING_LINEAR) > - return false; > - > -/* If the width is much smaller than a tile, don't bother tiling. */ > - if (row_pitch < 64) > - return true; > - > - if (ALIGN(row_pitch, 512) >= 32768) { For stable, you could at least fix these checks. The BLT limit is 32767*4 for tiled surfaces, or borrow blt_pitch() from intel_blit.c (or export the checker from intel_blit.c) -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Only emit VF cache invalidations when the high bits changes
Commit 92f01fc5f914 ("i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.") tried to only emit the VF invalidate if the high bits changed, but it accidentally always set need_invalidate to true; causing it to emit unconditionally emit the pipe control before every primitive. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106708 Cc: Kenneth Graunke Cc: Scott D Phillips Cc: Eero Tamminen --- src/mesa/drivers/dri/i965/genX_state_upload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index b485e2cf811..88fde9d12fd 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -504,7 +504,7 @@ static void vf_invalidate_for_vb_48bit_transitions(struct brw_context *brw) { #if GEN_GEN >= 8 - bool need_invalidate = true; + bool need_invalidate = false; unsigned i; for (i = 0; i < brw->vb.nr_buffers; i++) { -- 2.17.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Set the batch threshold based on the GTT size
The limit to the working set of a single batch is the amount we can fit into the GTT. The GTT is a per-context address space, so query the context rather than use an estimate based on the legacy global aperture. Futhermore, we can fine tune our soft-limit based on the knowledge of whether we are sharing the GTT with other clients. --- src/mesa/drivers/dri/i965/intel_screen.c | 55 +--- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index e6d49b94268..a7e0c20c9ca 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1420,15 +1420,58 @@ static const __DRIimageExtension intelImageExtension = { .queryDmaBufFormatModifierAttribs = intel_query_format_modifier_attribs, }; +static int +gem_param(int fd, int name) +{ + int v = -1; /* No param uses (yet) the sign bit, reserve it for errors */ + + struct drm_i915_getparam gp = { .param = name, .value = &v }; + if (drmIoctl(fd, DRM_IOCTL_I915_GETPARAM, &gp)) + return -1; + + return v; +} + +static int +gem_context_getparam(int fd, uint32_t context, uint64_t param, uint64_t *value) +{ + struct drm_i915_gem_context_param gp = { + .ctx_id = context, + .param = param, + }; + + if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &gp)) + return -1; + + *value = gp.value; + + return 0; +} + static uint64_t -get_aperture_size(int fd) +get_batch_threshold(int fd) { - struct drm_i915_gem_get_aperture aperture; + uint64_t gtt_size; + + /* I915_CONTEXT_PARAM_GTT_SIZE was introduced in kernel v4.5. Prior to that +* be cautious and assume that we only have the smallest 256MiB GTT. +* This is a soft-limit we use to flush the batch before it grows too large +* and tries to claim the entire system for itself. (It is still allowed +* to use as much as it requires of the GTT for a single primitive call, +* with the caveat that it may fail if we cannot fit it all in.) +*/ + if (gem_context_getparam(fd, 0, I915_CONTEXT_PARAM_GTT_SIZE, >t_size)) + gtt_size = 256 << 20; - if (drmIoctl(fd, DRM_IOCTL_I915_GEM_GET_APERTURE, &aperture) != 0) - return 0; + /* If we have to share the GTT with other clients, be conservative and +* set ourselves a lower soft-limit so we emit batches more often to avoid +* having multiple clients compete for hogging the entire system to +* themselves. +*/ + if (gem_param(fd, I915_PARAM_HAS_ALIASING_PPGTT) < 2) + gtt_size /= 2; - return aperture.aper_size; + return 3 * gtt_size / 4; } static int @@ -2482,7 +2525,7 @@ __DRIconfig **intelInitScreen2(__DRIscreen *dri_screen) screen->max_gtt_map_object_size = gtt_size / 4; } - screen->aperture_threshold = get_aperture_size(dri_screen->fd) * 3 / 4; + screen->aperture_threshold = get_batch_threshold(dri_screen->fd); screen->hw_has_swizzling = intel_detect_swizzling(screen); screen->hw_has_timestamp = intel_detect_timestamp(screen); -- 2.17.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] i965: Enable fast detiling paths for !llc
Just a small series to put the new cache-line read back to good use for ye olde Xorg on bxt (and older/newer with very similar effect). From 4 trep @ 0.7007 msec ( 1430.0/sec): ShmPutImage 500x500 square 4000 trep @ 9.0367 msec ( 111.0/sec): ShmGetImage 500x500 square to 6 trep @ 0.5084 msec ( 1970.0/sec): ShmPutImage 500x500 square 12000 trep @ 2.4808 msec ( 403.0/sec): ShmGetImage 500x500 square ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965: Push the format checks to intel_tiled_memcpy
Allow the tiled_memcpy backend to determine if it is able to copy between the source and destination pixel buffer. This allows us to eliminate some duplication in the callers, and permits us to be more flexible in checking for compatible formats. (Hmm, is sRGB handling right?) --- src/mesa/drivers/dri/i965/intel_pixel_read.c | 16 +-- src/mesa/drivers/dri/i965/intel_tex_image.c | 46 +++- .../drivers/dri/i965/intel_tiled_memcpy.c | 108 +++--- .../drivers/dri/i965/intel_tiled_memcpy.h | 17 ++- 4 files changed, 102 insertions(+), 85 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c b/src/mesa/drivers/dri/i965/intel_pixel_read.c index 6ed7895bc76..a545d215ad6 100644 --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c @@ -86,15 +86,12 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx, /* The miptree's buffer. */ struct brw_bo *bo; - uint32_t cpp; - mem_copy_fn mem_copy = NULL; /* This fastpath is restricted to specific renderbuffer types: * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support * more types. */ if (!devinfo->has_llc || - !(type == GL_UNSIGNED_BYTE || type == GL_UNSIGNED_INT_8_8_8_8_REV) || pixels == NULL || _mesa_is_bufferobj(pack->BufferObj) || pack->Alignment > 4 || @@ -117,15 +114,9 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx, if (rb->NumSamples > 1) return false; - /* We can't handle copying from RGBX or BGRX because the tiled_memcpy -* function doesn't set the last channel to 1. Note this checks BaseFormat -* rather than TexFormat in case the RGBX format is being simulated with an -* RGBA format. -*/ - if (rb->_BaseFormat == GL_RGB) - return false; - - if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp)) + mem_copy_fn mem_copy = + intel_get_memcpy(rb->Format, format, type, INTEL_DOWNLOAD); + if (mem_copy == NULL) return false; if (!irb->mt || @@ -198,6 +189,7 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx, pack->Alignment, pack->RowLength, pack->SkipPixels, pack->SkipRows); + uint32_t cpp = _mesa_get_format_bytes(rb->Format); tiled_to_linear( xoffset * cpp, (xoffset + width) * cpp, yoffset, yoffset + height, diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c index fae179214dd..de8832812c1 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_image.c +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c @@ -186,13 +186,6 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, struct brw_context *brw = brw_context(ctx); const struct gen_device_info *devinfo = &brw->screen->devinfo; struct intel_texture_image *image = intel_texture_image(texImage); - int src_pitch; - - /* The miptree's buffer. */ - struct brw_bo *bo; - - uint32_t cpp; - mem_copy_fn mem_copy = NULL; /* This fastpath is restricted to specific texture types: * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support @@ -204,7 +197,6 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, * we need tests. */ if (!devinfo->has_llc || - !(type == GL_UNSIGNED_BYTE || type == GL_UNSIGNED_INT_8_8_8_8_REV) || !(texImage->TexObject->Target == GL_TEXTURE_2D || texImage->TexObject->Target == GL_TEXTURE_RECTANGLE) || pixels == NULL || @@ -222,7 +214,12 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, if (ctx->_ImageTransferState) return false; - if (!intel_get_memcpy(texImage->TexFormat, format, type, &mem_copy, &cpp)) + if (format == GL_COLOR_INDEX) + return false; + + mem_copy_fn mem_copy = + intel_get_memcpy(texImage->TexFormat, format, type, INTEL_UPLOAD); + if (mem_copy == NULL) return false; /* If this is a nontrivial texture view, let another path handle it instead. */ @@ -257,7 +254,7 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, intel_miptree_access_raw(brw, image->mt, level, 0, true); - bo = image->mt->bo; + struct brw_bo *bo = image->mt->bo; if (brw_batch_references(&brw->batch, bo)) { perf_debug("Flushing before mapping a referenced bo.\n"); @@ -270,7 +267,7 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, return false; } - src_pitch = _mesa_image_row_stride(packing, width, format, type); + int src_pitch = _mesa_image_row_stride(packing, width, format, type); /* We postponed printing this message until having committed to executing * the function. @@ -289,6 +286,8 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx, xoffset += level_x; yoffset += level_y; + uint32_t cpp = _mesa_get_format_bytes(texImage->TexFormat); + linear_to_tiled( xoffset * cpp, (xoffset + width) * cpp, yoffset, yoffset + hei