[Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
Ignore trying to shrink from i915 if we fail to acquire the struct_mutex in the shrinker while performing direct-reclaim. The trade-off being (much) lower latency for non-i915 clients at an increased risk of being unable to obtain a page from direct-reclaim without hitting the oom-notifier. The proviso being that we still keep trying to hard obtain the lock for oom so that we can reap under heavy memory pressure. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index ea90d3a0d511..d461f458f4af 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -36,7 +36,9 @@ #include "i915_drv.h" #include "i915_trace.h" -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) +static bool shrinker_lock(struct drm_i915_private *i915, + unsigned int flags, + bool *unlock) { switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { case MUTEX_TRYLOCK_RECURSIVE: @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) case MUTEX_TRYLOCK_FAILED: *unlock = false; - preempt_disable(); - do { - cpu_relax(); - if (mutex_trylock(&i915->drm.struct_mutex)) { - *unlock = true; - break; - } - } while (!need_resched()); - preempt_enable(); + if (flags & I915_SHRINK_ACTIVE) { + mutex_lock_nested(&i915->drm.struct_mutex, + I915_MM_SHRINKER); + *unlock = true; + } return *unlock; case MUTEX_TRYLOCK_SUCCESS: @@ -160,7 +158,7 @@ i915_gem_shrink(struct drm_i915_private *i915, unsigned long scanned = 0; bool unlock; - if (!shrinker_lock(i915, &unlock)) + if (!shrinker_lock(i915, flags, &unlock)) return 0; /* @@ -357,7 +355,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) sc->nr_scanned = 0; - if (!shrinker_lock(i915, &unlock)) + if (!shrinker_lock(i915, 0, &unlock)) return SHRINK_STOP; freed = i915_gem_shrink(i915, @@ -397,7 +395,7 @@ shrinker_lock_uninterruptible(struct drm_i915_private *i915, bool *unlock, do { if (i915_gem_wait_for_idle(i915, 0, MAX_SCHEDULE_TIMEOUT) == 0 && - shrinker_lock(i915, unlock)) + shrinker_lock(i915, 0, unlock)) break; schedule_timeout_killable(1); -- 2.19.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
On 08/11/2018 08:17, Chris Wilson wrote: Ignore trying to shrink from i915 if we fail to acquire the struct_mutex in the shrinker while performing direct-reclaim. The trade-off being (much) lower latency for non-i915 clients at an increased risk of being unable to obtain a page from direct-reclaim without hitting the oom-notifier. The proviso being that we still keep trying to hard obtain the lock for oom so that we can reap under heavy memory pressure. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index ea90d3a0d511..d461f458f4af 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -36,7 +36,9 @@ #include "i915_drv.h" #include "i915_trace.h" -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) +static bool shrinker_lock(struct drm_i915_private *i915, + unsigned int flags, + bool *unlock) { switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { case MUTEX_TRYLOCK_RECURSIVE: @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) case MUTEX_TRYLOCK_FAILED: *unlock = false; - preempt_disable(); - do { - cpu_relax(); - if (mutex_trylock(&i915->drm.struct_mutex)) { - *unlock = true; - break; - } - } while (!need_resched()); - preempt_enable(); + if (flags & I915_SHRINK_ACTIVE) { So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink in the normal case (direct reclaim?) or oom, we bail out on the first sign of struct mutex contention. Doesn't this make our shrinker much less effective at runtime and why is that OK? Or in other words, for what use cases, tests or benchmark was the existing approach of busy looping a problem? + mutex_lock_nested(&i915->drm.struct_mutex, + I915_MM_SHRINKER); _nested is needed since abandoning trylock would otherwise cause lockdep issues? But is it really safe? I don't know.. how can it be? It is guaranteed to be a different process here otherwise the result wouldn't be MUTEX_TRYLOCK_FAILED. Also, I915_MM_SHRINKER is so far only documented to apply to obj->mm.lock. Regards, Tvrtko + *unlock = true; + } return *unlock; case MUTEX_TRYLOCK_SUCCESS: @@ -160,7 +158,7 @@ i915_gem_shrink(struct drm_i915_private *i915, unsigned long scanned = 0; bool unlock; - if (!shrinker_lock(i915, &unlock)) + if (!shrinker_lock(i915, flags, &unlock)) return 0; /* @@ -357,7 +355,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) sc->nr_scanned = 0; - if (!shrinker_lock(i915, &unlock)) + if (!shrinker_lock(i915, 0, &unlock)) return SHRINK_STOP; freed = i915_gem_shrink(i915, @@ -397,7 +395,7 @@ shrinker_lock_uninterruptible(struct drm_i915_private *i915, bool *unlock, do { if (i915_gem_wait_for_idle(i915, 0, MAX_SCHEDULE_TIMEOUT) == 0 && - shrinker_lock(i915, unlock)) + shrinker_lock(i915, 0, unlock)) break; schedule_timeout_killable(1); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
Quoting Tvrtko Ursulin (2018-11-08 16:23:08) > > On 08/11/2018 08:17, Chris Wilson wrote: > > Ignore trying to shrink from i915 if we fail to acquire the struct_mutex > > in the shrinker while performing direct-reclaim. The trade-off being > > (much) lower latency for non-i915 clients at an increased risk of being > > unable to obtain a page from direct-reclaim without hitting the > > oom-notifier. The proviso being that we still keep trying to hard > > obtain the lock for oom so that we can reap under heavy memory pressure. > > > > Signed-off-by: Chris Wilson > > Cc: Tvrtko Ursulin > > --- > > drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- > > 1 file changed, 11 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c > > b/drivers/gpu/drm/i915/i915_gem_shrinker.c > > index ea90d3a0d511..d461f458f4af 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c > > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c > > @@ -36,7 +36,9 @@ > > #include "i915_drv.h" > > #include "i915_trace.h" > > > > -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) > > +static bool shrinker_lock(struct drm_i915_private *i915, > > + unsigned int flags, > > + bool *unlock) > > { > > switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { > > case MUTEX_TRYLOCK_RECURSIVE: > > @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private > > *i915, bool *unlock) > > > > case MUTEX_TRYLOCK_FAILED: > > *unlock = false; > > - preempt_disable(); > > - do { > > - cpu_relax(); > > - if (mutex_trylock(&i915->drm.struct_mutex)) { > > - *unlock = true; > > - break; > > - } > > - } while (!need_resched()); > > - preempt_enable(); > > + if (flags & I915_SHRINK_ACTIVE) { > > So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink > in the normal case (direct reclaim?) or oom, we bail out on the first > sign of struct mutex contention. Doesn't this make our shrinker much > less effective at runtime and why is that OK? As I said, it's a tradeoff between blocking others for _several_ _seconds_ and making no progress and returning immediately and making no progress. My argument is along the lines of if direct-reclaim is running in another process and something else is engaged in the driver hopefully the driver will be cleaning up as it goes along or else what remains is active and won't be reaped anyway. If direct reclaim is failing, the delay before trying the oom path is insignificant. > Or in other words, for what use cases, tests or benchmark was the > existing approach of busy looping a problem? Do something like 'find / -exec cat' while running i915 and see how long you have to wait for a khungtaskd :| gem_syslatency reports max latencies of over 600s, and I'm sure it's pretty much unbounded. It's bad enough that I expect we are the cause of significant hitching (mainly in other tasks) on the desktop running at memory capacity. > > + mutex_lock_nested(&i915->drm.struct_mutex, > > + I915_MM_SHRINKER); > > _nested is needed since abandoning trylock would otherwise cause lockdep > issues? But is it really safe? I don't know.. how can it be? It is > guaranteed to be a different process here otherwise the result wouldn't > be MUTEX_TRYLOCK_FAILED. The easy option was to only force the mutex_lock for kswapd. However, noting that we do need to forcibly shrink before oom, I opted for the more inclusive attempt to take it in both. For the oom approach, my only handwaving is we shouldn't be nested under oom serialisation and so avoid an ABBA deadlock. It's survived some oom abuse, but the machine still becomes unresponsive (but pingable) eventually (as it does before the patch). > Also, I915_MM_SHRINKER is so far only documented to apply to obj->mm.lock. It suits very nicely as it being applied for the same purpose. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
On 08/11/2018 16:48, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-11-08 16:23:08) On 08/11/2018 08:17, Chris Wilson wrote: Ignore trying to shrink from i915 if we fail to acquire the struct_mutex in the shrinker while performing direct-reclaim. The trade-off being (much) lower latency for non-i915 clients at an increased risk of being unable to obtain a page from direct-reclaim without hitting the oom-notifier. The proviso being that we still keep trying to hard obtain the lock for oom so that we can reap under heavy memory pressure. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index ea90d3a0d511..d461f458f4af 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -36,7 +36,9 @@ #include "i915_drv.h" #include "i915_trace.h" -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) +static bool shrinker_lock(struct drm_i915_private *i915, + unsigned int flags, + bool *unlock) { switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { case MUTEX_TRYLOCK_RECURSIVE: @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) case MUTEX_TRYLOCK_FAILED: *unlock = false; - preempt_disable(); - do { - cpu_relax(); - if (mutex_trylock(&i915->drm.struct_mutex)) { - *unlock = true; - break; - } - } while (!need_resched()); - preempt_enable(); + if (flags & I915_SHRINK_ACTIVE) { So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink in the normal case (direct reclaim?) or oom, we bail out on the first sign of struct mutex contention. Doesn't this make our shrinker much less effective at runtime and why is that OK? As I said, it's a tradeoff between blocking others for _several_ _seconds_ and making no progress and returning immediately and making no progress. My argument is along the lines of if direct-reclaim is running in another process and something else is engaged in the driver hopefully the driver will be cleaning up as it goes along or else what remains is active and won't be reaped anyway. If direct reclaim is failing, the delay before trying the oom path is insignificant. What was the rationale behind busy looping there btw? Compared to perhaps an alternative of micro-sleeps and trying a few times? I know it would be opposite from what this patch is trying to achieve, I Just don't had a good judgment on what makes most sense for the shrinker. Is it better to perhaps try a little bit harder instead of giving up immediately, but try a little bit harder in a softer way? Or that ends up blocking the callers and has the same effect of making no progress? Or in other words, for what use cases, tests or benchmark was the existing approach of busy looping a problem? Do something like 'find / -exec cat' while running i915 and see how long you have to wait for a khungtaskd :| I couldn't reproduce anything strange with this. Assuming you meant something like -exec cat { } \; >dev/null. Either case I think explanations like this should go into the commit message. gem_syslatency reports max latencies of over 600s, and I'm sure it's pretty much unbounded. It's bad enough that I expect we are the cause of significant hitching (mainly in other tasks) on the desktop running at memory capacity. Running gem_syslatency in parallel to the find, or gem_syslatency -b in parallel or not did not do anything. Then I tries gem_shrink but that doesn't seem to be making any progress with or without the patch. But it's not even hitting the i915 hard, but perhaps I need to turn off lockdep.. Which would be a bummer since I wanted to have a lockdep to check the next bit.. + mutex_lock_nested(&i915->drm.struct_mutex, + I915_MM_SHRINKER); _nested is needed since abandoning trylock would otherwise cause lockdep issues? But is it really safe? I don't know.. how can it be? It is guaranteed to be a different process here otherwise the result wouldn't be MUTEX_TRYLOCK_FAILED. The easy option was to only force the mutex_lock for kswapd. However, noting that we do need to forcibly shrink before oom, I opted for the more inclusive attempt to take it in both. For the oom approach, my only handwaving is we shouldn't be nested under oom serialisation and so avoid an ABBA deadlock. It's survived some oom abuse, but the machine still becomes unresponsive (but pingable) eventually (as it does before the patch). ...which I completely di
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
Quoting Tvrtko Ursulin (2018-11-09 07:30:34) > > On 08/11/2018 16:48, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-11-08 16:23:08) > >> > >> On 08/11/2018 08:17, Chris Wilson wrote: > >>> Ignore trying to shrink from i915 if we fail to acquire the struct_mutex > >>> in the shrinker while performing direct-reclaim. The trade-off being > >>> (much) lower latency for non-i915 clients at an increased risk of being > >>> unable to obtain a page from direct-reclaim without hitting the > >>> oom-notifier. The proviso being that we still keep trying to hard > >>> obtain the lock for oom so that we can reap under heavy memory pressure. > >>> > >>> Signed-off-by: Chris Wilson > >>> Cc: Tvrtko Ursulin > >>> --- > >>>drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- > >>>1 file changed, 11 insertions(+), 13 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>> b/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>> index ea90d3a0d511..d461f458f4af 100644 > >>> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>> @@ -36,7 +36,9 @@ > >>>#include "i915_drv.h" > >>>#include "i915_trace.h" > >>> > >>> -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) > >>> +static bool shrinker_lock(struct drm_i915_private *i915, > >>> + unsigned int flags, > >>> + bool *unlock) > >>>{ > >>>switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { > >>>case MUTEX_TRYLOCK_RECURSIVE: > >>> @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private > >>> *i915, bool *unlock) > >>> > >>>case MUTEX_TRYLOCK_FAILED: > >>>*unlock = false; > >>> - preempt_disable(); > >>> - do { > >>> - cpu_relax(); > >>> - if (mutex_trylock(&i915->drm.struct_mutex)) { > >>> - *unlock = true; > >>> - break; > >>> - } > >>> - } while (!need_resched()); > >>> - preempt_enable(); > >>> + if (flags & I915_SHRINK_ACTIVE) { > >> > >> So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink > >> in the normal case (direct reclaim?) or oom, we bail out on the first > >> sign of struct mutex contention. Doesn't this make our shrinker much > >> less effective at runtime and why is that OK? > > > > As I said, it's a tradeoff between blocking others for _several_ > > _seconds_ and making no progress and returning immediately and making no > > progress. My argument is along the lines of if direct-reclaim is running > > in another process and something else is engaged in the driver hopefully > > the driver will be cleaning up as it goes along or else what remains is > > active and won't be reaped anyway. If direct reclaim is failing, the > > delay before trying the oom path is insignificant. > > What was the rationale behind busy looping there btw? Emulating the optimistic spin for mutex (my patches to expose it from kernel/locking were kept hidden for public decency). My thinking was the exact opposite to this patch, that direct reclaim was of paramount importance and spending the time to try and ensure we grabbed the struct_mutex to search for some pages to free was preferable. It's just on the basis of looking at the actual syslatency and realising the cause is this spinner, I want to swing the axe in other direction. (There's probably a compromise, but honestly I'd prefer to sell the struct_mutex free version of the shrinker first :) > Compared to > perhaps an alternative of micro-sleeps and trying a few times? I know it > would be opposite from what this patch is trying to achieve, I Just > don't had a good judgment on what makes most sense for the shrinker. Is > it better to perhaps try a little bit harder instead of giving up > immediately, but try a little bit harder in a softer way? Or that ends > up blocking the callers and has the same effect of making no progress? Exactly. We can definitely measure the impact of the spinner on unrelated processes, but detecting the premature allocation failure is harder (we wait for more dmesg-warns). The compromise that I've tried to reach here is that if direct-reclaim isn't enough, then we should still try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL vulnerable to not shrinking i915, but a worthwhile compromise as it's allowed to fail?) > >> Or in other words, for what use cases, tests or benchmark was the > >> existing approach of busy looping a problem? > > > > Do something like 'find / -exec cat' while running i915 and see how long > > you have to wait for a khungtaskd :| > > I couldn't reproduce anything strange with this. Assuming you meant > something like -exec cat { } \; >dev/null. > > Either case I think explanations like this should go into the
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
On 09/11/2018 11:44, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-11-09 07:30:34) On 08/11/2018 16:48, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-11-08 16:23:08) On 08/11/2018 08:17, Chris Wilson wrote: Ignore trying to shrink from i915 if we fail to acquire the struct_mutex in the shrinker while performing direct-reclaim. The trade-off being (much) lower latency for non-i915 clients at an increased risk of being unable to obtain a page from direct-reclaim without hitting the oom-notifier. The proviso being that we still keep trying to hard obtain the lock for oom so that we can reap under heavy memory pressure. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index ea90d3a0d511..d461f458f4af 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -36,7 +36,9 @@ #include "i915_drv.h" #include "i915_trace.h" -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) +static bool shrinker_lock(struct drm_i915_private *i915, + unsigned int flags, + bool *unlock) { switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { case MUTEX_TRYLOCK_RECURSIVE: @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) case MUTEX_TRYLOCK_FAILED: *unlock = false; - preempt_disable(); - do { - cpu_relax(); - if (mutex_trylock(&i915->drm.struct_mutex)) { - *unlock = true; - break; - } - } while (!need_resched()); - preempt_enable(); + if (flags & I915_SHRINK_ACTIVE) { So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink in the normal case (direct reclaim?) or oom, we bail out on the first sign of struct mutex contention. Doesn't this make our shrinker much less effective at runtime and why is that OK? As I said, it's a tradeoff between blocking others for _several_ _seconds_ and making no progress and returning immediately and making no progress. My argument is along the lines of if direct-reclaim is running in another process and something else is engaged in the driver hopefully the driver will be cleaning up as it goes along or else what remains is active and won't be reaped anyway. If direct reclaim is failing, the delay before trying the oom path is insignificant. What was the rationale behind busy looping there btw? Emulating the optimistic spin for mutex (my patches to expose it from kernel/locking were kept hidden for public decency). My thinking was the exact opposite to this patch, that direct reclaim was of paramount importance and spending the time to try and ensure we grabbed the struct_mutex to search for some pages to free was preferable. It's just on the basis of looking at the actual syslatency and realising the cause is this spinner, I want to swing the axe in other direction. (There's probably a compromise, but honestly I'd prefer to sell the struct_mutex free version of the shrinker first :) Compared to perhaps an alternative of micro-sleeps and trying a few times? I know it would be opposite from what this patch is trying to achieve, I Just don't had a good judgment on what makes most sense for the shrinker. Is it better to perhaps try a little bit harder instead of giving up immediately, but try a little bit harder in a softer way? Or that ends up blocking the callers and has the same effect of making no progress? Exactly. We can definitely measure the impact of the spinner on unrelated processes, but detecting the premature allocation failure is harder (we wait for more dmesg-warns). The compromise that I've tried to reach here is that if direct-reclaim isn't enough, then we should still try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL vulnerable to not shrinking i915, but a worthwhile compromise as it's allowed to fail?) Or in other words, for what use cases, tests or benchmark was the existing approach of busy looping a problem? Do something like 'find / -exec cat' while running i915 and see how long you have to wait for a khungtaskd :| I couldn't reproduce anything strange with this. Assuming you meant something like -exec cat { } \; >dev/null. Either case I think explanations like this should go into the commit message. Weird, I spent so long over the last few weeks talking about the impact on gem_syslatency, I thought it was mentioned here. gem_syslatency reports max latencies of over 600s, and I'm sure it's pretty much unbounded. It's bad enough that I expect we are the cause of significant hitching (mainly in othe
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
Quoting Tvrtko Ursulin (2018-11-13 10:24:43) > > On 09/11/2018 11:44, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-11-09 07:30:34) > >> > >> On 08/11/2018 16:48, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-11-08 16:23:08) > > On 08/11/2018 08:17, Chris Wilson wrote: > > Ignore trying to shrink from i915 if we fail to acquire the struct_mutex > > in the shrinker while performing direct-reclaim. The trade-off being > > (much) lower latency for non-i915 clients at an increased risk of being > > unable to obtain a page from direct-reclaim without hitting the > > oom-notifier. The proviso being that we still keep trying to hard > > obtain the lock for oom so that we can reap under heavy memory pressure. > > > > Signed-off-by: Chris Wilson > > Cc: Tvrtko Ursulin > > --- > > drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 > > +++- > > 1 file changed, 11 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c > > b/drivers/gpu/drm/i915/i915_gem_shrinker.c > > index ea90d3a0d511..d461f458f4af 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c > > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c > > @@ -36,7 +36,9 @@ > > #include "i915_drv.h" > > #include "i915_trace.h" > > > > -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) > > +static bool shrinker_lock(struct drm_i915_private *i915, > > + unsigned int flags, > > + bool *unlock) > > { > > switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { > > case MUTEX_TRYLOCK_RECURSIVE: > > @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private > > *i915, bool *unlock) > > > > case MUTEX_TRYLOCK_FAILED: > > *unlock = false; > > - preempt_disable(); > > - do { > > - cpu_relax(); > > - if (mutex_trylock(&i915->drm.struct_mutex)) { > > - *unlock = true; > > - break; > > - } > > - } while (!need_resched()); > > - preempt_enable(); > > + if (flags & I915_SHRINK_ACTIVE) { > > So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink > in the normal case (direct reclaim?) or oom, we bail out on the first > sign of struct mutex contention. Doesn't this make our shrinker much > less effective at runtime and why is that OK? > >>> > >>> As I said, it's a tradeoff between blocking others for _several_ > >>> _seconds_ and making no progress and returning immediately and making no > >>> progress. My argument is along the lines of if direct-reclaim is running > >>> in another process and something else is engaged in the driver hopefully > >>> the driver will be cleaning up as it goes along or else what remains is > >>> active and won't be reaped anyway. If direct reclaim is failing, the > >>> delay before trying the oom path is insignificant. > >> > >> What was the rationale behind busy looping there btw? > > > > Emulating the optimistic spin for mutex (my patches to expose it from > > kernel/locking were kept hidden for public decency). My thinking was the > > exact opposite to this patch, that direct reclaim was of paramount > > importance and spending the time to try and ensure we grabbed the > > struct_mutex to search for some pages to free was preferable. > > > > It's just on the basis of looking at the actual syslatency and realising > > the cause is this spinner, I want to swing the axe in other direction. > > > > (There's probably a compromise, but honestly I'd prefer to sell the > > struct_mutex free version of the shrinker first :) > > > >> Compared to > >> perhaps an alternative of micro-sleeps and trying a few times? I know it > >> would be opposite from what this patch is trying to achieve, I Just > >> don't had a good judgment on what makes most sense for the shrinker. Is > >> it better to perhaps try a little bit harder instead of giving up > >> immediately, but try a little bit harder in a softer way? Or that ends > >> up blocking the callers and has the same effect of making no progress? > > > > Exactly. We can definitely measure the impact of the spinner on > > unrelated processes, but detecting the premature allocation failure is > > harder (we wait for more dmesg-warns). The compromise that I've tried to > > reach here is that if direct-reclaim isn't enough, then we should still > > try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL > > vulnerable to not shrinking i915, but a worthwhile compromise as it's > > allowed to fail?) > > > Or in other words, for what use cases, tests or benchmark was the > exi
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
On 13/11/2018 17:10, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-11-13 10:24:43) On 09/11/2018 11:44, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-11-09 07:30:34) On 08/11/2018 16:48, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-11-08 16:23:08) On 08/11/2018 08:17, Chris Wilson wrote: Ignore trying to shrink from i915 if we fail to acquire the struct_mutex in the shrinker while performing direct-reclaim. The trade-off being (much) lower latency for non-i915 clients at an increased risk of being unable to obtain a page from direct-reclaim without hitting the oom-notifier. The proviso being that we still keep trying to hard obtain the lock for oom so that we can reap under heavy memory pressure. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index ea90d3a0d511..d461f458f4af 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -36,7 +36,9 @@ #include "i915_drv.h" #include "i915_trace.h" -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) +static bool shrinker_lock(struct drm_i915_private *i915, + unsigned int flags, + bool *unlock) { switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { case MUTEX_TRYLOCK_RECURSIVE: @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) case MUTEX_TRYLOCK_FAILED: *unlock = false; - preempt_disable(); - do { - cpu_relax(); - if (mutex_trylock(&i915->drm.struct_mutex)) { - *unlock = true; - break; - } - } while (!need_resched()); - preempt_enable(); + if (flags & I915_SHRINK_ACTIVE) { So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink in the normal case (direct reclaim?) or oom, we bail out on the first sign of struct mutex contention. Doesn't this make our shrinker much less effective at runtime and why is that OK? As I said, it's a tradeoff between blocking others for _several_ _seconds_ and making no progress and returning immediately and making no progress. My argument is along the lines of if direct-reclaim is running in another process and something else is engaged in the driver hopefully the driver will be cleaning up as it goes along or else what remains is active and won't be reaped anyway. If direct reclaim is failing, the delay before trying the oom path is insignificant. What was the rationale behind busy looping there btw? Emulating the optimistic spin for mutex (my patches to expose it from kernel/locking were kept hidden for public decency). My thinking was the exact opposite to this patch, that direct reclaim was of paramount importance and spending the time to try and ensure we grabbed the struct_mutex to search for some pages to free was preferable. It's just on the basis of looking at the actual syslatency and realising the cause is this spinner, I want to swing the axe in other direction. (There's probably a compromise, but honestly I'd prefer to sell the struct_mutex free version of the shrinker first :) Compared to perhaps an alternative of micro-sleeps and trying a few times? I know it would be opposite from what this patch is trying to achieve, I Just don't had a good judgment on what makes most sense for the shrinker. Is it better to perhaps try a little bit harder instead of giving up immediately, but try a little bit harder in a softer way? Or that ends up blocking the callers and has the same effect of making no progress? Exactly. We can definitely measure the impact of the spinner on unrelated processes, but detecting the premature allocation failure is harder (we wait for more dmesg-warns). The compromise that I've tried to reach here is that if direct-reclaim isn't enough, then we should still try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL vulnerable to not shrinking i915, but a worthwhile compromise as it's allowed to fail?) Or in other words, for what use cases, tests or benchmark was the existing approach of busy looping a problem? Do something like 'find / -exec cat' while running i915 and see how long you have to wait for a khungtaskd :| I couldn't reproduce anything strange with this. Assuming you meant something like -exec cat { } \; >dev/null. Either case I think explanations like this should go into the commit message. Weird, I spent so long over the last few weeks talking about the impact on gem_syslatency, I thought it was mentioned here. gem_syslatency reports max latencies of over 600s, and I'm sure it's pretty mu