[Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-08 Thread Chris Wilson
Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
in the shrinker while performing direct-reclaim. The trade-off being
(much) lower latency for non-i915 clients at an increased risk of being
unable to obtain a page from direct-reclaim without hitting the
oom-notifier. The proviso being that we still keep trying to hard
obtain the lock for oom so that we can reap under heavy memory pressure.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ea90d3a0d511..d461f458f4af 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -36,7 +36,9 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 
-static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)
+static bool shrinker_lock(struct drm_i915_private *i915,
+ unsigned int flags,
+ bool *unlock)
 {
switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
case MUTEX_TRYLOCK_RECURSIVE:
@@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, 
bool *unlock)
 
case MUTEX_TRYLOCK_FAILED:
*unlock = false;
-   preempt_disable();
-   do {
-   cpu_relax();
-   if (mutex_trylock(&i915->drm.struct_mutex)) {
-   *unlock = true;
-   break;
-   }
-   } while (!need_resched());
-   preempt_enable();
+   if (flags & I915_SHRINK_ACTIVE) {
+   mutex_lock_nested(&i915->drm.struct_mutex,
+ I915_MM_SHRINKER);
+   *unlock = true;
+   }
return *unlock;
 
case MUTEX_TRYLOCK_SUCCESS:
@@ -160,7 +158,7 @@ i915_gem_shrink(struct drm_i915_private *i915,
unsigned long scanned = 0;
bool unlock;
 
-   if (!shrinker_lock(i915, &unlock))
+   if (!shrinker_lock(i915, flags, &unlock))
return 0;
 
/*
@@ -357,7 +355,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
 
sc->nr_scanned = 0;
 
-   if (!shrinker_lock(i915, &unlock))
+   if (!shrinker_lock(i915, 0, &unlock))
return SHRINK_STOP;
 
freed = i915_gem_shrink(i915,
@@ -397,7 +395,7 @@ shrinker_lock_uninterruptible(struct drm_i915_private 
*i915, bool *unlock,
do {
if (i915_gem_wait_for_idle(i915,
   0, MAX_SCHEDULE_TIMEOUT) == 0 &&
-   shrinker_lock(i915, unlock))
+   shrinker_lock(i915, 0, unlock))
break;
 
schedule_timeout_killable(1);
-- 
2.19.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-08 Thread Tvrtko Ursulin


On 08/11/2018 08:17, Chris Wilson wrote:

Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
in the shrinker while performing direct-reclaim. The trade-off being
(much) lower latency for non-i915 clients at an increased risk of being
unable to obtain a page from direct-reclaim without hitting the
oom-notifier. The proviso being that we still keep trying to hard
obtain the lock for oom so that we can reap under heavy memory pressure.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
  1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ea90d3a0d511..d461f458f4af 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -36,7 +36,9 @@
  #include "i915_drv.h"
  #include "i915_trace.h"
  
-static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)

+static bool shrinker_lock(struct drm_i915_private *i915,
+ unsigned int flags,
+ bool *unlock)
  {
switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
case MUTEX_TRYLOCK_RECURSIVE:
@@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, 
bool *unlock)
  
  	case MUTEX_TRYLOCK_FAILED:

*unlock = false;
-   preempt_disable();
-   do {
-   cpu_relax();
-   if (mutex_trylock(&i915->drm.struct_mutex)) {
-   *unlock = true;
-   break;
-   }
-   } while (!need_resched());
-   preempt_enable();
+   if (flags & I915_SHRINK_ACTIVE) {


So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink 
in the normal case (direct reclaim?) or oom, we bail out on the first 
sign of struct mutex contention. Doesn't this make our shrinker much 
less effective at runtime and why is that OK?


Or in other words, for what use cases, tests or benchmark was the 
existing approach of busy looping a problem?



+   mutex_lock_nested(&i915->drm.struct_mutex,
+ I915_MM_SHRINKER);


_nested is needed since abandoning trylock would otherwise cause lockdep 
issues? But is it really safe? I don't know.. how can it be? It is 
guaranteed to be a different process here otherwise the result wouldn't 
be MUTEX_TRYLOCK_FAILED.


Also, I915_MM_SHRINKER is so far only documented to apply to obj->mm.lock.

Regards,

Tvrtko


+   *unlock = true;
+   }
return *unlock;
  
  	case MUTEX_TRYLOCK_SUCCESS:

@@ -160,7 +158,7 @@ i915_gem_shrink(struct drm_i915_private *i915,
unsigned long scanned = 0;
bool unlock;
  
-	if (!shrinker_lock(i915, &unlock))

+   if (!shrinker_lock(i915, flags, &unlock))
return 0;
  
  	/*

@@ -357,7 +355,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
  
  	sc->nr_scanned = 0;
  
-	if (!shrinker_lock(i915, &unlock))

+   if (!shrinker_lock(i915, 0, &unlock))
return SHRINK_STOP;
  
  	freed = i915_gem_shrink(i915,

@@ -397,7 +395,7 @@ shrinker_lock_uninterruptible(struct drm_i915_private 
*i915, bool *unlock,
do {
if (i915_gem_wait_for_idle(i915,
   0, MAX_SCHEDULE_TIMEOUT) == 0 &&
-   shrinker_lock(i915, unlock))
+   shrinker_lock(i915, 0, unlock))
break;
  
  		schedule_timeout_killable(1);



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-08 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-11-08 16:23:08)
> 
> On 08/11/2018 08:17, Chris Wilson wrote:
> > Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
> > in the shrinker while performing direct-reclaim. The trade-off being
> > (much) lower latency for non-i915 clients at an increased risk of being
> > unable to obtain a page from direct-reclaim without hitting the
> > oom-notifier. The proviso being that we still keep trying to hard
> > obtain the lock for oom so that we can reap under heavy memory pressure.
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
> >   1 file changed, 11 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> > b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index ea90d3a0d511..d461f458f4af 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -36,7 +36,9 @@
> >   #include "i915_drv.h"
> >   #include "i915_trace.h"
> >   
> > -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)
> > +static bool shrinker_lock(struct drm_i915_private *i915,
> > +   unsigned int flags,
> > +   bool *unlock)
> >   {
> >   switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
> >   case MUTEX_TRYLOCK_RECURSIVE:
> > @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private 
> > *i915, bool *unlock)
> >   
> >   case MUTEX_TRYLOCK_FAILED:
> >   *unlock = false;
> > - preempt_disable();
> > - do {
> > - cpu_relax();
> > - if (mutex_trylock(&i915->drm.struct_mutex)) {
> > - *unlock = true;
> > - break;
> > - }
> > - } while (!need_resched());
> > - preempt_enable();
> > + if (flags & I915_SHRINK_ACTIVE) {
> 
> So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink 
> in the normal case (direct reclaim?) or oom, we bail out on the first 
> sign of struct mutex contention. Doesn't this make our shrinker much 
> less effective at runtime and why is that OK?

As I said, it's a tradeoff between blocking others for _several_
_seconds_ and making no progress and returning immediately and making no
progress. My argument is along the lines of if direct-reclaim is running
in another process and something else is engaged in the driver hopefully
the driver will be cleaning up as it goes along or else what remains is
active and won't be reaped anyway. If direct reclaim is failing, the
delay before trying the oom path is insignificant.

> Or in other words, for what use cases, tests or benchmark was the 
> existing approach of busy looping a problem?

Do something like 'find / -exec cat' while running i915 and see how long
you have to wait for a khungtaskd :|

gem_syslatency reports max latencies of over 600s, and I'm sure it's
pretty much unbounded. It's bad enough that I expect we are the cause of
significant hitching (mainly in other tasks) on the desktop running at
memory capacity.

> > + mutex_lock_nested(&i915->drm.struct_mutex,
> > +   I915_MM_SHRINKER);
> 
> _nested is needed since abandoning trylock would otherwise cause lockdep 
> issues? But is it really safe? I don't know.. how can it be? It is 
> guaranteed to be a different process here otherwise the result wouldn't 
> be MUTEX_TRYLOCK_FAILED.

The easy option was to only force the mutex_lock for kswapd. However,
noting that we do need to forcibly shrink before oom, I opted for the
more inclusive attempt to take it in both. For the oom approach, my only
handwaving is we shouldn't be nested under oom serialisation and so
avoid an ABBA deadlock. It's survived some oom abuse, but the machine
still becomes unresponsive (but pingable) eventually (as it does before
the patch).

> Also, I915_MM_SHRINKER is so far only documented to apply to obj->mm.lock.

It suits very nicely as it being applied for the same purpose.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-08 Thread Tvrtko Ursulin


On 08/11/2018 16:48, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-11-08 16:23:08)


On 08/11/2018 08:17, Chris Wilson wrote:

Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
in the shrinker while performing direct-reclaim. The trade-off being
(much) lower latency for non-i915 clients at an increased risk of being
unable to obtain a page from direct-reclaim without hitting the
oom-notifier. The proviso being that we still keep trying to hard
obtain the lock for oom so that we can reap under heavy memory pressure.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
   drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
   1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ea90d3a0d511..d461f458f4af 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -36,7 +36,9 @@
   #include "i915_drv.h"
   #include "i915_trace.h"
   
-static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)

+static bool shrinker_lock(struct drm_i915_private *i915,
+   unsigned int flags,
+   bool *unlock)
   {
   switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
   case MUTEX_TRYLOCK_RECURSIVE:
@@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, 
bool *unlock)
   
   case MUTEX_TRYLOCK_FAILED:

   *unlock = false;
- preempt_disable();
- do {
- cpu_relax();
- if (mutex_trylock(&i915->drm.struct_mutex)) {
- *unlock = true;
- break;
- }
- } while (!need_resched());
- preempt_enable();
+ if (flags & I915_SHRINK_ACTIVE) {


So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink
in the normal case (direct reclaim?) or oom, we bail out on the first
sign of struct mutex contention. Doesn't this make our shrinker much
less effective at runtime and why is that OK?


As I said, it's a tradeoff between blocking others for _several_
_seconds_ and making no progress and returning immediately and making no
progress. My argument is along the lines of if direct-reclaim is running
in another process and something else is engaged in the driver hopefully
the driver will be cleaning up as it goes along or else what remains is
active and won't be reaped anyway. If direct reclaim is failing, the
delay before trying the oom path is insignificant.


What was the rationale behind busy looping there btw? Compared to 
perhaps an alternative of micro-sleeps and trying a few times? I know it 
would be opposite from what this patch is trying to achieve, I Just 
don't had a good judgment on what makes most sense for the shrinker. Is 
it better to perhaps try a little bit harder instead of giving up 
immediately, but try a little bit harder in a softer way? Or that ends 
up blocking the callers and has the same effect of making no progress?



Or in other words, for what use cases, tests or benchmark was the
existing approach of busy looping a problem?


Do something like 'find / -exec cat' while running i915 and see how long
you have to wait for a khungtaskd :|


I couldn't reproduce anything strange with this. Assuming you meant 
something like -exec cat { } \; >dev/null.


Either case I think explanations like this should go into the commit 
message.



gem_syslatency reports max latencies of over 600s, and I'm sure it's
pretty much unbounded. It's bad enough that I expect we are the cause of
significant hitching (mainly in other tasks) on the desktop running at
memory capacity.


Running gem_syslatency in parallel to the find, or gem_syslatency -b in 
parallel or not did not do anything.


Then I tries gem_shrink but that doesn't seem to be making any progress 
with or without the patch. But it's not even hitting the i915 hard, but 
perhaps I need to turn off lockdep.. Which would be a bummer since I 
wanted to have a lockdep to check the next bit..



+ mutex_lock_nested(&i915->drm.struct_mutex,
+   I915_MM_SHRINKER);


_nested is needed since abandoning trylock would otherwise cause lockdep
issues? But is it really safe? I don't know.. how can it be? It is
guaranteed to be a different process here otherwise the result wouldn't
be MUTEX_TRYLOCK_FAILED.


The easy option was to only force the mutex_lock for kswapd. However,
noting that we do need to forcibly shrink before oom, I opted for the
more inclusive attempt to take it in both. For the oom approach, my only
handwaving is we shouldn't be nested under oom serialisation and so
avoid an ABBA deadlock. It's survived some oom abuse, but the machine
still becomes unresponsive (but pingable) eventually (as it does before
the patch).


...which I completely di

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-09 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-11-09 07:30:34)
> 
> On 08/11/2018 16:48, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-11-08 16:23:08)
> >>
> >> On 08/11/2018 08:17, Chris Wilson wrote:
> >>> Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
> >>> in the shrinker while performing direct-reclaim. The trade-off being
> >>> (much) lower latency for non-i915 clients at an increased risk of being
> >>> unable to obtain a page from direct-reclaim without hitting the
> >>> oom-notifier. The proviso being that we still keep trying to hard
> >>> obtain the lock for oom so that we can reap under heavy memory pressure.
> >>>
> >>> Signed-off-by: Chris Wilson 
> >>> Cc: Tvrtko Ursulin 
> >>> ---
> >>>drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
> >>>1 file changed, 11 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> >>> b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >>> index ea90d3a0d511..d461f458f4af 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >>> @@ -36,7 +36,9 @@
> >>>#include "i915_drv.h"
> >>>#include "i915_trace.h"
> >>>
> >>> -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)
> >>> +static bool shrinker_lock(struct drm_i915_private *i915,
> >>> +   unsigned int flags,
> >>> +   bool *unlock)
> >>>{
> >>>switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
> >>>case MUTEX_TRYLOCK_RECURSIVE:
> >>> @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private 
> >>> *i915, bool *unlock)
> >>>
> >>>case MUTEX_TRYLOCK_FAILED:
> >>>*unlock = false;
> >>> - preempt_disable();
> >>> - do {
> >>> - cpu_relax();
> >>> - if (mutex_trylock(&i915->drm.struct_mutex)) {
> >>> - *unlock = true;
> >>> - break;
> >>> - }
> >>> - } while (!need_resched());
> >>> - preempt_enable();
> >>> + if (flags & I915_SHRINK_ACTIVE) {
> >>
> >> So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink
> >> in the normal case (direct reclaim?) or oom, we bail out on the first
> >> sign of struct mutex contention. Doesn't this make our shrinker much
> >> less effective at runtime and why is that OK?
> > 
> > As I said, it's a tradeoff between blocking others for _several_
> > _seconds_ and making no progress and returning immediately and making no
> > progress. My argument is along the lines of if direct-reclaim is running
> > in another process and something else is engaged in the driver hopefully
> > the driver will be cleaning up as it goes along or else what remains is
> > active and won't be reaped anyway. If direct reclaim is failing, the
> > delay before trying the oom path is insignificant.
> 
> What was the rationale behind busy looping there btw?

Emulating the optimistic spin for mutex (my patches to expose it from
kernel/locking were kept hidden for public decency). My thinking was the
exact opposite to this patch, that direct reclaim was of paramount
importance and spending the time to try and ensure we grabbed the
struct_mutex to search for some pages to free was preferable.

It's just on the basis of looking at the actual syslatency and realising
the cause is this spinner, I want to swing the axe in other direction.

(There's probably a compromise, but honestly I'd prefer to sell the
struct_mutex free version of the shrinker first :)

> Compared to 
> perhaps an alternative of micro-sleeps and trying a few times? I know it 
> would be opposite from what this patch is trying to achieve, I Just 
> don't had a good judgment on what makes most sense for the shrinker. Is 
> it better to perhaps try a little bit harder instead of giving up 
> immediately, but try a little bit harder in a softer way? Or that ends 
> up blocking the callers and has the same effect of making no progress?

Exactly. We can definitely measure the impact of the spinner on
unrelated processes, but detecting the premature allocation failure is
harder (we wait for more dmesg-warns). The compromise that I've tried to
reach here is that if direct-reclaim isn't enough, then we should still
try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL
vulnerable to not shrinking i915, but a worthwhile compromise as it's
allowed to fail?)

> >> Or in other words, for what use cases, tests or benchmark was the
> >> existing approach of busy looping a problem?
> > 
> > Do something like 'find / -exec cat' while running i915 and see how long
> > you have to wait for a khungtaskd :|
> 
> I couldn't reproduce anything strange with this. Assuming you meant 
> something like -exec cat { } \; >dev/null.
> 
> Either case I think explanations like this should go into the 

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-13 Thread Tvrtko Ursulin


On 09/11/2018 11:44, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-11-09 07:30:34)


On 08/11/2018 16:48, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-11-08 16:23:08)


On 08/11/2018 08:17, Chris Wilson wrote:

Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
in the shrinker while performing direct-reclaim. The trade-off being
(much) lower latency for non-i915 clients at an increased risk of being
unable to obtain a page from direct-reclaim without hitting the
oom-notifier. The proviso being that we still keep trying to hard
obtain the lock for oom so that we can reap under heavy memory pressure.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ea90d3a0d511..d461f458f4af 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -36,7 +36,9 @@
#include "i915_drv.h"
#include "i915_trace.h"

-static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)

+static bool shrinker_lock(struct drm_i915_private *i915,
+   unsigned int flags,
+   bool *unlock)
{
switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
case MUTEX_TRYLOCK_RECURSIVE:
@@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, 
bool *unlock)

case MUTEX_TRYLOCK_FAILED:

*unlock = false;
- preempt_disable();
- do {
- cpu_relax();
- if (mutex_trylock(&i915->drm.struct_mutex)) {
- *unlock = true;
- break;
- }
- } while (!need_resched());
- preempt_enable();
+ if (flags & I915_SHRINK_ACTIVE) {


So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink
in the normal case (direct reclaim?) or oom, we bail out on the first
sign of struct mutex contention. Doesn't this make our shrinker much
less effective at runtime and why is that OK?


As I said, it's a tradeoff between blocking others for _several_
_seconds_ and making no progress and returning immediately and making no
progress. My argument is along the lines of if direct-reclaim is running
in another process and something else is engaged in the driver hopefully
the driver will be cleaning up as it goes along or else what remains is
active and won't be reaped anyway. If direct reclaim is failing, the
delay before trying the oom path is insignificant.


What was the rationale behind busy looping there btw?


Emulating the optimistic spin for mutex (my patches to expose it from
kernel/locking were kept hidden for public decency). My thinking was the
exact opposite to this patch, that direct reclaim was of paramount
importance and spending the time to try and ensure we grabbed the
struct_mutex to search for some pages to free was preferable.

It's just on the basis of looking at the actual syslatency and realising
the cause is this spinner, I want to swing the axe in other direction.

(There's probably a compromise, but honestly I'd prefer to sell the
struct_mutex free version of the shrinker first :)


Compared to
perhaps an alternative of micro-sleeps and trying a few times? I know it
would be opposite from what this patch is trying to achieve, I Just
don't had a good judgment on what makes most sense for the shrinker. Is
it better to perhaps try a little bit harder instead of giving up
immediately, but try a little bit harder in a softer way? Or that ends
up blocking the callers and has the same effect of making no progress?


Exactly. We can definitely measure the impact of the spinner on
unrelated processes, but detecting the premature allocation failure is
harder (we wait for more dmesg-warns). The compromise that I've tried to
reach here is that if direct-reclaim isn't enough, then we should still
try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL
vulnerable to not shrinking i915, but a worthwhile compromise as it's
allowed to fail?)


Or in other words, for what use cases, tests or benchmark was the
existing approach of busy looping a problem?


Do something like 'find / -exec cat' while running i915 and see how long
you have to wait for a khungtaskd :|


I couldn't reproduce anything strange with this. Assuming you meant
something like -exec cat { } \; >dev/null.

Either case I think explanations like this should go into the commit
message.


Weird, I spent so long over the last few weeks talking about the impact
on gem_syslatency, I thought it was mentioned here.


gem_syslatency reports max latencies of over 600s, and I'm sure it's
pretty much unbounded. It's bad enough that I expect we are the cause of
significant hitching (mainly in othe

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-13 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-11-13 10:24:43)
> 
> On 09/11/2018 11:44, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-11-09 07:30:34)
> >>
> >> On 08/11/2018 16:48, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-11-08 16:23:08)
> 
>  On 08/11/2018 08:17, Chris Wilson wrote:
> > Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
> > in the shrinker while performing direct-reclaim. The trade-off being
> > (much) lower latency for non-i915 clients at an increased risk of being
> > unable to obtain a page from direct-reclaim without hitting the
> > oom-notifier. The proviso being that we still keep trying to hard
> > obtain the lock for oom so that we can reap under heavy memory pressure.
> >
> > Signed-off-by: Chris Wilson 
> > Cc: Tvrtko Ursulin 
> > ---
> > drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 
> > +++-
> > 1 file changed, 11 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> > b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index ea90d3a0d511..d461f458f4af 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -36,7 +36,9 @@
> > #include "i915_drv.h"
> > #include "i915_trace.h"
> > 
> > -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)
> > +static bool shrinker_lock(struct drm_i915_private *i915,
> > +   unsigned int flags,
> > +   bool *unlock)
> > {
> > switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
> > case MUTEX_TRYLOCK_RECURSIVE:
> > @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private 
> > *i915, bool *unlock)
> > 
> > case MUTEX_TRYLOCK_FAILED:
> > *unlock = false;
> > - preempt_disable();
> > - do {
> > - cpu_relax();
> > - if (mutex_trylock(&i915->drm.struct_mutex)) {
> > - *unlock = true;
> > - break;
> > - }
> > - } while (!need_resched());
> > - preempt_enable();
> > + if (flags & I915_SHRINK_ACTIVE) {
> 
>  So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink
>  in the normal case (direct reclaim?) or oom, we bail out on the first
>  sign of struct mutex contention. Doesn't this make our shrinker much
>  less effective at runtime and why is that OK?
> >>>
> >>> As I said, it's a tradeoff between blocking others for _several_
> >>> _seconds_ and making no progress and returning immediately and making no
> >>> progress. My argument is along the lines of if direct-reclaim is running
> >>> in another process and something else is engaged in the driver hopefully
> >>> the driver will be cleaning up as it goes along or else what remains is
> >>> active and won't be reaped anyway. If direct reclaim is failing, the
> >>> delay before trying the oom path is insignificant.
> >>
> >> What was the rationale behind busy looping there btw?
> > 
> > Emulating the optimistic spin for mutex (my patches to expose it from
> > kernel/locking were kept hidden for public decency). My thinking was the
> > exact opposite to this patch, that direct reclaim was of paramount
> > importance and spending the time to try and ensure we grabbed the
> > struct_mutex to search for some pages to free was preferable.
> > 
> > It's just on the basis of looking at the actual syslatency and realising
> > the cause is this spinner, I want to swing the axe in other direction.
> > 
> > (There's probably a compromise, but honestly I'd prefer to sell the
> > struct_mutex free version of the shrinker first :)
> > 
> >> Compared to
> >> perhaps an alternative of micro-sleeps and trying a few times? I know it
> >> would be opposite from what this patch is trying to achieve, I Just
> >> don't had a good judgment on what makes most sense for the shrinker. Is
> >> it better to perhaps try a little bit harder instead of giving up
> >> immediately, but try a little bit harder in a softer way? Or that ends
> >> up blocking the callers and has the same effect of making no progress?
> > 
> > Exactly. We can definitely measure the impact of the spinner on
> > unrelated processes, but detecting the premature allocation failure is
> > harder (we wait for more dmesg-warns). The compromise that I've tried to
> > reach here is that if direct-reclaim isn't enough, then we should still
> > try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL
> > vulnerable to not shrinking i915, but a worthwhile compromise as it's
> > allowed to fail?)
> > 
>  Or in other words, for what use cases, tests or benchmark was the
>  exi

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim

2018-11-15 Thread Tvrtko Ursulin


On 13/11/2018 17:10, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-11-13 10:24:43)


On 09/11/2018 11:44, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-11-09 07:30:34)


On 08/11/2018 16:48, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-11-08 16:23:08)


On 08/11/2018 08:17, Chris Wilson wrote:

Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
in the shrinker while performing direct-reclaim. The trade-off being
(much) lower latency for non-i915 clients at an increased risk of being
unable to obtain a page from direct-reclaim without hitting the
oom-notifier. The proviso being that we still keep trying to hard
obtain the lock for oom so that we can reap under heavy memory pressure.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ea90d3a0d511..d461f458f4af 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -36,7 +36,9 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 
-static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)

+static bool shrinker_lock(struct drm_i915_private *i915,
+   unsigned int flags,
+   bool *unlock)
 {
 switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
 case MUTEX_TRYLOCK_RECURSIVE:
@@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, 
bool *unlock)
 
 case MUTEX_TRYLOCK_FAILED:

 *unlock = false;
- preempt_disable();
- do {
- cpu_relax();
- if (mutex_trylock(&i915->drm.struct_mutex)) {
- *unlock = true;
- break;
- }
- } while (!need_resched());
- preempt_enable();
+ if (flags & I915_SHRINK_ACTIVE) {


So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink
in the normal case (direct reclaim?) or oom, we bail out on the first
sign of struct mutex contention. Doesn't this make our shrinker much
less effective at runtime and why is that OK?


As I said, it's a tradeoff between blocking others for _several_
_seconds_ and making no progress and returning immediately and making no
progress. My argument is along the lines of if direct-reclaim is running
in another process and something else is engaged in the driver hopefully
the driver will be cleaning up as it goes along or else what remains is
active and won't be reaped anyway. If direct reclaim is failing, the
delay before trying the oom path is insignificant.


What was the rationale behind busy looping there btw?


Emulating the optimistic spin for mutex (my patches to expose it from
kernel/locking were kept hidden for public decency). My thinking was the
exact opposite to this patch, that direct reclaim was of paramount
importance and spending the time to try and ensure we grabbed the
struct_mutex to search for some pages to free was preferable.

It's just on the basis of looking at the actual syslatency and realising
the cause is this spinner, I want to swing the axe in other direction.

(There's probably a compromise, but honestly I'd prefer to sell the
struct_mutex free version of the shrinker first :)


Compared to
perhaps an alternative of micro-sleeps and trying a few times? I know it
would be opposite from what this patch is trying to achieve, I Just
don't had a good judgment on what makes most sense for the shrinker. Is
it better to perhaps try a little bit harder instead of giving up
immediately, but try a little bit harder in a softer way? Or that ends
up blocking the callers and has the same effect of making no progress?


Exactly. We can definitely measure the impact of the spinner on
unrelated processes, but detecting the premature allocation failure is
harder (we wait for more dmesg-warns). The compromise that I've tried to
reach here is that if direct-reclaim isn't enough, then we should still
try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL
vulnerable to not shrinking i915, but a worthwhile compromise as it's
allowed to fail?)


Or in other words, for what use cases, tests or benchmark was the
existing approach of busy looping a problem?


Do something like 'find / -exec cat' while running i915 and see how long
you have to wait for a khungtaskd :|


I couldn't reproduce anything strange with this. Assuming you meant
something like -exec cat { } \; >dev/null.

Either case I think explanations like this should go into the commit
message.


Weird, I spent so long over the last few weeks talking about the impact
on gem_syslatency, I thought it was mentioned here.


gem_syslatency reports max latencies of over 600s, and I'm sure it's
pretty mu