On Fri, Oct 06, 2017 at 11:06:37AM +0200, Daniel Vetter wrote:

> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.

synchronize_*() provides an smp_mb() on the calling CPU and ensures an
smp_mb() on all other CPUs before completion, such that everybody agrees
on the state prior to calling syncrhonize_rcu(). So yes, no additional
ordering requirements.

>  drivers/gpu/drm/i915/i915_gem.c                   | 31 
> +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct 
> drm_i915_gem_request *request)
>       intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
>       /* Mark all executing requests as skipped */
>       engine->cancel_requests(engine);
>  
> @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs 
> *engine)
>                                      intel_engine_last_submit(engine));
>  }
>  
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>  {
>       struct intel_engine_cs *engine;
>       enum intel_engine_id id;
>  
>       for_each_engine(engine, i915, id)
> +             engine->submit_request = nop_submit_request;
>  
> +     /* Make sure no one is running the old callback before we proceed with
> +      * cancelling requests and resetting the completion tracking. Otherwise
> +      * we might submit a request to the hardware which never completes.
> +      */

ARGH @ horrid comment style..

  http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html

:-)

> +     synchronize_rcu();
>  
> +     for_each_engine(engine, i915, id)
> +             engine_complete_requests(engine);
>  
> +     set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +     wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
>  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c 
> b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum 
> i915_sw_fence_notify state)
>       switch (state) {
>       case FENCE_COMPLETE:
>               trace_i915_gem_request_submit(request);
> +             rcu_read_lock();
>               request->engine->submit_request(request);
> +             rcu_read_unlock();
>               break;
>  
>       case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c 
> b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>       }
>       i915_gem_request_get(vip);
>       i915_add_request(vip);
> +     rcu_read_lock();
>       request->engine->submit_request(request);
> +     rcu_read_unlock();
>  
>       mutex_unlock(&i915->drm.struct_mutex);

Yes, this is a correct and good replacement, however, you said:

> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

This means that you could simply do synchronize_sched() without the
addition of rcu_read_lock()s and still be fine.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to