> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> Sent: Thursday, December 05, 2013 2:43 PM
> To: Intel Graphics Development
> Cc: Daniel Vetter; Lister, Ian; sta...@vger.kernel.org; Widawsky, Benjamin;
> Stéphane Marchesin; Bloomfield, Jon
> Subject: [PATCH] drm/i915: Fix use-after-free in do_switch
> 
> So apparently under ridiculous amounts of memory pressure we can get into
> trouble in do_switch when we try to move the old hw context backing
> storage object onto the active lists.
> 
> With list debugging enabled that usually results in us chasing a poisoned
> pointer - which means we've hit upon a vma that has been removed from all
> lrus with list_del (and then deallocated, so it's a real use-after free).
> 
> Ian Lister has done some great callchain chasing and noticed that we can
> reenter do_switch:
> 
> i915_gem_do_execbuffer()
> 
> i915_switch_context()
> 
> do_switch()
>    from = ring->last_context;
>    i915_gem_object_pin()
> 
>       i915_gem_object_bind_to_gtt()
>          ret = drm_mm_insert_node_in_range_generic();
>          // If the above call fails then it will try 
> i915_gem_evict_something()
>          // If that fails it will call i915_gem_evict_everything() ...
>        i915_gem_evict_everything()
>           i915_gpu_idle()
>              i915_switch_context(DEFAULT_CONTEXT)
> 
> Like with everything else where the shrinker or eviction code can invalidate
> pointers we need to reload relevant state.
> 
> Note that there's no need to recheck whether a context switch is still
> required because:
> 
> - Doing a switch to the same context is harmless (besides wasting a
>   bit of energy).
> 
> - This can only happen with the default context. But since that one's
>   pinned we'll never call down into evict_everything under normal
>   circumstances. Note that there's a little driver bringup fun
>   involved namely that we could recourse into do_switch for the
>   initial switch. Atm we're fine since we assign the context pointer
>   only after the call to do_switch at driver load or resume time. And
>   in the gpu reset case we skip the entire setup sequence (which might
>   be a bug on its own, but definitely not this one here).
> 
> Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and
> not just on artificial stress tests), see the reference.
> 
> Note that in upstream code doesn't calle evict_everything directly from
> evict_something, that's an extension in this product branch. But we can still
> hit upon this bug (and apparently we do, see the linked backtraces). I've
> noticed this while trying to construct a testcase for this bug and utterly 
> failed
> to provoke it. It looks like we need to driver the system squarly into the
> lowmem wall and provoke the shrinker to evict the context object by doing
> the last-ditch evict_everything call.
> 
> Aside: There's currently no means to get a badly-fragmenting hw context
> object away from a bad spot in the upstream code. We should fix this by at
> least adding some code to evict_something to handle hw contexts.
> 
> References: https://code.google.com/p/chromium/issues/detail?id=248191
> Reported-by: Ian Lister <ian.lis...@intel.com>
> Cc: Ian Lister <ian.lis...@intel.com>
> Cc: sta...@vger.kernel.org
> Cc: Ben Widawsky <benjamin.widaw...@intel.com>
> Cc: Stéphane Marchesin <marc...@chromium.org>
> Cc: Bloomfield, Jon <jon.bloomfi...@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vet...@ffwll.ch>

Reviewed-by: Ian Lister <ian.lis...@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 41877045a1a0..2d2877493f61 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to)
>       if (ret)
>               return ret;
> 
> -     /* Clear this page out of any CPU caches for coherent swap-in/out.
> Note
> +     /*
> +      * Pin can switch back to the default context if we end up calling into
> +      * evict_everything - as a last ditch gtt defrag effort that also
> +      * switches to the default context. Hence we need to reload from
> here.
> +      */
> +     from = ring->last_context;
> +
> +     /*
> +      * Clear this page out of any CPU caches for coherent swap-in/out.
> +Note
>        * that thanks to write = false in this call and us not setting any gpu
>        * write domains when putting a context object onto the active list
>        * (when switching away from it), this won't block.
> -      * XXX: We need a real interface to do this instead of trickery. */
> +      *
> +      * XXX: We need a real interface to do this instead of trickery.
> +      */
>       ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
>       if (ret) {
>               i915_gem_object_unpin(to->obj);
> --
> 1.8.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to