Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-05 Thread Tejun Heo
Hello,

On Tue, Sep 05, 2017 at 02:43:14PM +0100, Chris Wilson wrote:
> > Can't you use cancel[_delayed]_work_sync()?
> 
> We then need a loop like:
> 
>   do {
>   if (cancel_delayed_work_sync(wrk))
>   do_work(wrk);
>   else
>   break;
>   } while (1);
> 
> We do want the flush semantics.

I see.  Heh, I don't know.  One thing you can try to do is putting
them on a separate workqueue and use drain_workqueue() or
destroy_workqueue() on it.  Those functions expect there to be some
requeueing but warns if there are too much (non configurable now but
we can add if necessary).

Thanks.

-- 
tejun
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-05 Thread Chris Wilson
Quoting Tejun Heo (2017-09-05 14:36:28)
> On Mon, Sep 04, 2017 at 10:35:49AM +0200, Daniel Vetter wrote:
> > On Fri, Sep 01, 2017 at 03:11:23PM +0100, Chris Wilson wrote:
> > > If a worker requeues itself, it may switch to a different kworker pool,
> > > which flush_work() considers as complete. To be strict, we then need to
> > > keep flushing the work until it is no longer pending.
> > > 
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
> > > Signed-off-by: Chris Wilson 
> > > Cc: Mika Kuoppala 
> > 
> > Shouldn't this be a thing the workqueue subsystem exposes? Adding Tejun et
> > al.
> > -Daniel
> 
> Can't you use cancel[_delayed]_work_sync()?

We then need a loop like:

do {
if (cancel_delayed_work_sync(wrk))
do_work(wrk);
else
break;
} while (1);

We do want the flush semantics.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-05 Thread Tejun Heo
On Mon, Sep 04, 2017 at 10:35:49AM +0200, Daniel Vetter wrote:
> On Fri, Sep 01, 2017 at 03:11:23PM +0100, Chris Wilson wrote:
> > If a worker requeues itself, it may switch to a different kworker pool,
> > which flush_work() considers as complete. To be strict, we then need to
> > keep flushing the work until it is no longer pending.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> 
> Shouldn't this be a thing the workqueue subsystem exposes? Adding Tejun et
> al.
> -Daniel

Can't you use cancel[_delayed]_work_sync()?

Thanks.

-- 
tejun
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-04 Thread Chris Wilson
Quoting Daniel Vetter (2017-09-04 09:35:49)
> On Fri, Sep 01, 2017 at 03:11:23PM +0100, Chris Wilson wrote:
> > If a worker requeues itself, it may switch to a different kworker pool,
> > which flush_work() considers as complete. To be strict, we then need to
> > keep flushing the work until it is no longer pending.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> 
> Shouldn't this be a thing the workqueue subsystem exposes? Adding Tejun et
> al.

The semantics are so horrible that I wouldn't suggest that the core
should expose such a nasty trap. You have to be absolutely certain that
your work stops requeueing itself.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-04 Thread Chris Wilson
Quoting Mika Kuoppala (2017-09-04 11:19:09)
> Chris Wilson  writes:
> > +/*
> > + * Wait until the work is finally complete, even if it tries to postpone
> > + * by requeueing itself. Note, that if the worker never cancels itself,
> > + * we will spin forever.
> > + */
> > +static inline void drain_delayed_work(struct delayed_work *dw)
> > +{
> > + do {
> > + while (flush_delayed_work(dw))
> > + ;
> > + } while (delayed_work_pending(dw));
> 
> So we end spinning if someone doesn't let go of mutex.
> 
> Add a timeout for doing this and let it slide into
> suspend with a error message anyways instead of spinning
> forever?

Such error messages already exist (NMI). But for us if we skip flushing
the work, we leave the driver in a very inconsistent state and expect to
blow up on resume. It's a lose-lose.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-04 Thread Mika Kuoppala
Chris Wilson  writes:

> If a worker requeues itself, it may switch to a different kworker pool,
> which flush_work() considers as complete. To be strict, we then need to
> keep flushing the work until it is no longer pending.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  3 +--
>  drivers/gpu/drm/i915/i915_gem.c |  3 +--
>  drivers/gpu/drm/i915/i915_utils.h   | 13 +
>  3 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> b/drivers/gpu/drm/i915/i915_debugfs.c
> index 48572b157222..1dd687a74879 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4202,8 +4202,7 @@ fault_irq_set(struct drm_i915_private *i915,
>   mutex_unlock(>drm.struct_mutex);
>  
>   /* Flush idle worker to disarm irq */
> - while (flush_delayed_work(>gt.idle_work))
> - ;
> + drain_delayed_work(>gt.idle_work);
>  
>   return 0;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e4cc08bc518c..1829d3fa15d1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4582,8 +4582,7 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>   /* As the idle_work is rearming if it detects a race, play safe and
>* repeat the flush until it is definitely idle.
>*/
> - while (flush_delayed_work(_priv->gt.idle_work))
> - ;
> + drain_delayed_work(_priv->gt.idle_work);
>  
>   /* Assert that we sucessfully flushed all the work and
>* reset the GPU back to its idle, low power state.
> diff --git a/drivers/gpu/drm/i915/i915_utils.h 
> b/drivers/gpu/drm/i915/i915_utils.h
> index 12fc250b47b9..4f7ffa0976b1 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -119,4 +119,17 @@ static inline void __list_del_many(struct list_head 
> *head,
>   WRITE_ONCE(head->next, first);
>  }
>  
> +/*
> + * Wait until the work is finally complete, even if it tries to postpone
> + * by requeueing itself. Note, that if the worker never cancels itself,
> + * we will spin forever.
> + */
> +static inline void drain_delayed_work(struct delayed_work *dw)
> +{
> + do {
> + while (flush_delayed_work(dw))
> + ;
> + } while (delayed_work_pending(dw));

So we end spinning if someone doesn't let go of mutex.

Add a timeout for doing this and let it slide into
suspend with a error message anyways instead of spinning
forever?

-Mika

> +}
> +
>  #endif /* !__I915_UTILS_H */
> -- 
> 2.14.1
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-04 Thread Daniel Vetter
On Fri, Sep 01, 2017 at 03:11:23PM +0100, Chris Wilson wrote:
> If a worker requeues itself, it may switch to a different kworker pool,
> which flush_work() considers as complete. To be strict, we then need to
> keep flushing the work until it is no longer pending.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 

Shouldn't this be a thing the workqueue subsystem exposes? Adding Tejun et
al.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  3 +--
>  drivers/gpu/drm/i915/i915_gem.c |  3 +--
>  drivers/gpu/drm/i915/i915_utils.h   | 13 +
>  3 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> b/drivers/gpu/drm/i915/i915_debugfs.c
> index 48572b157222..1dd687a74879 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4202,8 +4202,7 @@ fault_irq_set(struct drm_i915_private *i915,
>   mutex_unlock(>drm.struct_mutex);
>  
>   /* Flush idle worker to disarm irq */
> - while (flush_delayed_work(>gt.idle_work))
> - ;
> + drain_delayed_work(>gt.idle_work);
>  
>   return 0;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e4cc08bc518c..1829d3fa15d1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4582,8 +4582,7 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>   /* As the idle_work is rearming if it detects a race, play safe and
>* repeat the flush until it is definitely idle.
>*/
> - while (flush_delayed_work(_priv->gt.idle_work))
> - ;
> + drain_delayed_work(_priv->gt.idle_work);
>  
>   /* Assert that we sucessfully flushed all the work and
>* reset the GPU back to its idle, low power state.
> diff --git a/drivers/gpu/drm/i915/i915_utils.h 
> b/drivers/gpu/drm/i915/i915_utils.h
> index 12fc250b47b9..4f7ffa0976b1 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -119,4 +119,17 @@ static inline void __list_del_many(struct list_head 
> *head,
>   WRITE_ONCE(head->next, first);
>  }
>  
> +/*
> + * Wait until the work is finally complete, even if it tries to postpone
> + * by requeueing itself. Note, that if the worker never cancels itself,
> + * we will spin forever.
> + */
> +static inline void drain_delayed_work(struct delayed_work *dw)
> +{
> + do {
> + while (flush_delayed_work(dw))
> + ;
> + } while (delayed_work_pending(dw));
> +}
> +
>  #endif /* !__I915_UTILS_H */
> -- 
> 2.14.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Try harder to finish the idle-worker

2017-09-01 Thread Chris Wilson
If a worker requeues itself, it may switch to a different kworker pool,
which flush_work() considers as complete. To be strict, we then need to
keep flushing the work until it is no longer pending.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  3 +--
 drivers/gpu/drm/i915/i915_gem.c |  3 +--
 drivers/gpu/drm/i915/i915_utils.h   | 13 +
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 48572b157222..1dd687a74879 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4202,8 +4202,7 @@ fault_irq_set(struct drm_i915_private *i915,
mutex_unlock(>drm.struct_mutex);
 
/* Flush idle worker to disarm irq */
-   while (flush_delayed_work(>gt.idle_work))
-   ;
+   drain_delayed_work(>gt.idle_work);
 
return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e4cc08bc518c..1829d3fa15d1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4582,8 +4582,7 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
/* As the idle_work is rearming if it detects a race, play safe and
 * repeat the flush until it is definitely idle.
 */
-   while (flush_delayed_work(_priv->gt.idle_work))
-   ;
+   drain_delayed_work(_priv->gt.idle_work);
 
/* Assert that we sucessfully flushed all the work and
 * reset the GPU back to its idle, low power state.
diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 12fc250b47b9..4f7ffa0976b1 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -119,4 +119,17 @@ static inline void __list_del_many(struct list_head *head,
WRITE_ONCE(head->next, first);
 }
 
+/*
+ * Wait until the work is finally complete, even if it tries to postpone
+ * by requeueing itself. Note, that if the worker never cancels itself,
+ * we will spin forever.
+ */
+static inline void drain_delayed_work(struct delayed_work *dw)
+{
+   do {
+   while (flush_delayed_work(dw))
+   ;
+   } while (delayed_work_pending(dw));
+}
+
 #endif /* !__I915_UTILS_H */
-- 
2.14.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx