> From: "Sven M. Hallberg" <pe...@khjk.org>
> Date: Mon, 03 Jun 2019 21:20:29 +0200
> 
> I've spent some more time poking through the drm code and think I have a
> good picture now of what is going on.
> 
> The inteldrm (i915) driver is essentially implementing a three-stage
> pipeline: setup - hardware - cleanup.
> 
> The first and second stage are perfomed in the originating
> thread/process (so usually an ioctl from userspace) unless
> nonblocking behavior has been requested. In that case, only setup is
> performed in the originating context and hardware is deferred to either
> 'system_unbound_wq' or a special 'modeset_wq'.
> 
> sys/dev/pci/drm/i915/intel_display.c intel_atomic_commit():
> 
>     if (nonblock && intel_state->modeset) {
>             queue_work(dev_priv->modeset_wq, &state->commit_work);
>     } else if (nonblock) {
>             queue_work(system_unbound_wq, &state->commit_work);
>     } else {
>             if (intel_state->modeset)
>                 flush_workqueue(dev_priv->modeset_wq);
>             intel_atomic_commit_tail(state);
>     }
> 
> This mirrors the generic (reference/example?) implementation
> sys/dev/pci/drm/drm_atomic_helper.c drm_atomic_helper_commit():
> 
>     if (nonblock)
>             queue_work(system_unbound_wq, &state->commit_work);
>     else
>             commit_tail(state);
> 
> Finally, the cleanup stage is deferred by i915 to 'system_wq'.
> sys/dev/pci/drm/i915/intel_display.c intel_atomic_commit_tail():
> 
>     INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
>     schedule_work(&state->commit_work);

Thanks for helping figuring this out.

> Mark Kettenis on Thu, May 30 2019:
> > I wonder whether the issue is that on OpenBSD we use a single-threaded
> > task queue, where Linux is pretty much always multi-threaded.  That
> > means that on OpenBSD, if a task blocks, no other task can run.  At
> > some point I ran into that issue, but worked around it by using an
> > additional task queue.
> 
> I'm pretty sure that this is accurate and using an additional queue
> would fix it (see below).
> 
> > I believe Linux uses a thread per cpu.  I think that means there is
> > actually a bug in the Linux code somewhere but nobody notices because
> > most systems have more than one cpu.
> 
> Yes, I think so, unless Linux dynamically spawns new workers when
> the "worker pool" runs out or something.
> 
> > Making the task queues multi-threaded on OpenBSD is hard since it
> > breaks the synchronization mechanism such as takq_barrier(9).
> 
> Agreed.
> 
> 
> In summary, I see three main options for fixing this:
> 
> 1. Multithread the work queue somehow. A few tricks come to mind, but
>    like you say, it's going to be fiddly.*
> 
> 2. Put the cleanup tasks in their own queue. The rationale being that
>    this properly assigns a worker to the third stage of the pipeline.
> 
> 3. Only defer cleanups in the nonblocking case. Rationale: This is
>    arguably expected behavior out of a blocking call and if you care
>    about the latency introduced by the cleanup, then you probably also
>    care about the setup and are doing nonblocking calls already.**
> 
>    Note that the nonblocking case will not stall, because nonblocking
>    commits go on system_unbound_wq or modeset_wq. Cleanups therefore end
>    up separated on system_wq. In fact, there is a comment that seems to
>    echo this in drm_atomic_helper_commit():
> 
>        /* [...]
>         * NOTE: Commit work has multiple phases, first hardware commit, then
>         * cleanup. We want them to overlap, hence need system_unbound_wq to
>         * make sure work items don't artifically stall on each another.
>         */
> 
> I'm attaching diffs for options 2 and 3. Both work for me.

Nice.  Nevertheless, I think we need to implement option 1.  We want
to keep the local diffs to a bare minimum such that we can easily diff
the code to the Linux DRM code.

I have some ideas, essentially wrapping the native OpenBSD task/taskq
interface to implement a more Linuxy behaviour.  But it will take me
some time to figure out the details.

> * For example: In wait_for_completion(), before going to sleep, check
>   whether we are a queue worker and if so, quickly spawn another to
>   continue in our place.
> 
> ** I guess the prime example would be fancy compositors that potentially
>    shout an ioctl at the kernel every frame. I have not checked whether
>    they do blocking or nonblocking calls.
> 
> 
> [2:text/x-patch Hide]
> 
> Index: dev/pci/drm/i915/i915_drv.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v
> retrieving revision 1.118
> diff -u -p -r1.118 i915_drv.c
> --- dev/pci/drm/i915/i915_drv.c       8 May 2019 15:55:56 -0000       1.118
> +++ dev/pci/drm/i915/i915_drv.c       3 Jun 2019 19:04:30 -0000
> @@ -869,8 +869,14 @@ static int i915_workqueues_init(struct d
>       if (dev_priv->hotplug.dp_wq == NULL)
>               goto out_free_wq;
>  
> +     dev_priv->cleanup_wq = alloc_ordered_workqueue("i915-cleanup", 0);
> +     if (dev_priv->cleanup_wq == NULL)
> +             goto out_free_wq2;
> +
>       return 0;
>  
> +out_free_wq2:
> +     destroy_workqueue(dev_priv->hotplug.dp_wq);
>  out_free_wq:
>       destroy_workqueue(dev_priv->wq);
>  out_err:
> Index: dev/pci/drm/i915/i915_drv.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.h,v
> retrieving revision 1.82
> diff -u -p -r1.82 i915_drv.h
> --- dev/pci/drm/i915/i915_drv.h       4 May 2019 11:34:47 -0000       1.82
> +++ dev/pci/drm/i915/i915_drv.h       3 Jun 2019 19:04:30 -0000
> @@ -1876,6 +1876,9 @@ struct inteldrm_softc {
>       /* ordered wq for modesets */
>       struct workqueue_struct *modeset_wq;
>  
> +     /* ordered wq for cleanups */
> +     struct workqueue_struct *cleanup_wq;
> +
>       /* Display functions */
>       struct drm_i915_display_funcs display;
>  
> Index: dev/pci/drm/i915/intel_display.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_display.c,v
> retrieving revision 1.63
> diff -u -p -r1.63 intel_display.c
> --- dev/pci/drm/i915/intel_display.c  14 Apr 2019 10:14:52 -0000      1.63
> +++ dev/pci/drm/i915/intel_display.c  3 Jun 2019 19:04:31 -0000
> @@ -12761,9 +12761,14 @@ static void intel_atomic_commit_tail(str
>        * deferring to a new worker seems overkill, but we would place a
>        * schedule point (cond_resched()) here anyway to keep latencies
>        * down.
> +      *
> +      * NOTE: Some driver code paths (e.g. i915_driver_register) perform
> +      * blocking commits (via drm_atomic_commit) from system_wq tasks while
> +      * nonblocking commits use system_unbound_wq or our own modeset_wq.
> +      * Deferring cleanup to the same queue could cause workers to deadlock.
>        */
>       INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
> -     schedule_work(&state->commit_work);
> +     queue_work(dev_priv->cleanup_wq, &state->commit_work);
>  }
>  
>  static void intel_atomic_commit_work(struct work_struct *work)
> 
> [3:text/x-patch Hide]
> 
> Index: dev/pci/drm/i915/intel_display.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_display.c,v
> retrieving revision 1.63
> diff -u -p -r1.63 intel_display.c
> --- dev/pci/drm/i915/intel_display.c  14 Apr 2019 10:14:52 -0000      1.63
> +++ dev/pci/drm/i915/intel_display.c  3 Jun 2019 18:33:39 -0000
> @@ -12586,10 +12586,8 @@ static void intel_atomic_commit_fence_wa
>  #endif
>  }
>  
> -static void intel_atomic_cleanup_work(struct work_struct *work)
> +static void intel_atomic_cleanup(struct drm_atomic_state *state)
>  {
> -     struct drm_atomic_state *state =
> -             container_of(work, struct drm_atomic_state, commit_work);
>       struct drm_i915_private *i915 = to_i915(state->dev);
>  
>       drm_atomic_helper_cleanup_planes(&i915->drm, state);
> @@ -12599,7 +12597,15 @@ static void intel_atomic_cleanup_work(st
>       intel_atomic_helper_free_state(i915);
>  }
>  
> -static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> +static void intel_atomic_cleanup_work(struct work_struct *work)
> +{
> +     struct drm_atomic_state *state =
> +             container_of(work, struct drm_atomic_state, commit_work);
> +
> +     intel_atomic_cleanup(state);
> +}
> +
> +static void intel_atomic_commit_tail(struct drm_atomic_state *state, bool 
> nonblock)
>  {
>       struct drm_device *dev = state->dev;
>       struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> @@ -12754,16 +12760,22 @@ static void intel_atomic_commit_tail(str
>               intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET);
>       }
>  
> -     /*
> -      * Defer the cleanup of the old state to a separate worker to not
> -      * impede the current task (userspace for blocking modesets) that
> -      * are executed inline. For out-of-line asynchronous modesets/flips,
> -      * deferring to a new worker seems overkill, but we would place a
> -      * schedule point (cond_resched()) here anyway to keep latencies
> -      * down.
> -      */
> -     INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
> -     schedule_work(&state->commit_work);
> +     if (nonblock) {
> +             /*
> +              * Defer the cleanup of the old state to a separate worker to
> +              * keep latencies down.
> +              */
> +             INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
> +             schedule_work(&state->commit_work);
> +     } else {
> +             /*
> +              * NOTE: Some driver code paths (e.g. i915_driver_register)
> +              * perform blocking commits (via drm_atomic_commit) from
> +              * system_wq tasks. Deferring cleanup to system_wq as well
> +              * could cause workers to deadlock.
> +              */
> +             intel_atomic_cleanup(state);
> +     }
>  }
>  
>  static void intel_atomic_commit_work(struct work_struct *work)
> @@ -12771,7 +12783,7 @@ static void intel_atomic_commit_work(str
>       struct drm_atomic_state *state =
>               container_of(work, struct drm_atomic_state, commit_work);
>  
> -     intel_atomic_commit_tail(state);
> +     intel_atomic_commit_tail(state, true);
>  }
>  
>  static int __i915_sw_fence_call
> @@ -12906,7 +12918,7 @@ static int intel_atomic_commit(struct dr
>       } else {
>               if (intel_state->modeset)
>                       flush_workqueue(dev_priv->modeset_wq);
> -             intel_atomic_commit_tail(state);
> +             intel_atomic_commit_tail(state, false);
>       }
>  
>       return 0;

Reply via email to