Quoting Tvrtko Ursulin (2019-02-11 11:43:41)
>
> On 06/02/2019 13:03, Chris Wilson wrote:
> > As kmem_caches share the same properties (size, allocation/free behaviour)
> > for all potential devices, we can use global caches. While this
> > potential has worse fragmentation behaviour (one can argue that
> > different devices would have different activity lifetimes, but you can
> > also argue that activity is temporal across the system) it is the
> > default behaviour of the system at large to amalgamate matching caches.
> >
> > The benefit for us is much reduced pointer dancing along the frequent
> > allocation paths.
> >
> > v2: Defer shrinking until after a global grace period for futureproofing
> > multiple consumers of the slab caches, similar to the current strategy
> > for avoiding shrinking too early.
> >
> > Signed-off-by: Chris Wilson <[email protected]>
> > ---
> > drivers/gpu/drm/i915/Makefile | 1 +
> > drivers/gpu/drm/i915/i915_active.c | 7 +-
> > drivers/gpu/drm/i915/i915_active.h | 1 +
> > drivers/gpu/drm/i915/i915_drv.h | 3 -
> > drivers/gpu/drm/i915/i915_gem.c | 34 +-----
> > drivers/gpu/drm/i915/i915_globals.c | 105 ++++++++++++++++++
> > drivers/gpu/drm/i915/i915_globals.h | 15 +++
> > drivers/gpu/drm/i915/i915_pci.c | 8 +-
> > drivers/gpu/drm/i915/i915_request.c | 53 +++++++--
> > drivers/gpu/drm/i915/i915_request.h | 10 ++
> > drivers/gpu/drm/i915/i915_scheduler.c | 66 ++++++++---
> > drivers/gpu/drm/i915/i915_scheduler.h | 34 +++++-
> > drivers/gpu/drm/i915/intel_guc_submission.c | 3 +-
> > drivers/gpu/drm/i915/intel_lrc.c | 6 +-
> > drivers/gpu/drm/i915/intel_ringbuffer.h | 17 ---
> > drivers/gpu/drm/i915/selftests/intel_lrc.c | 2 +-
> > drivers/gpu/drm/i915/selftests/mock_engine.c | 48 ++++----
> > .../gpu/drm/i915/selftests/mock_gem_device.c | 26 -----
> > drivers/gpu/drm/i915/selftests/mock_request.c | 12 +-
> > drivers/gpu/drm/i915/selftests/mock_request.h | 7 --
> > 20 files changed, 306 insertions(+), 152 deletions(-)
> > create mode 100644 drivers/gpu/drm/i915/i915_globals.c
> > create mode 100644 drivers/gpu/drm/i915/i915_globals.h
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 1787e1299b1b..a1d834068765 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -77,6 +77,7 @@ i915-y += \
> > i915_gem_tiling.o \
> > i915_gem_userptr.o \
> > i915_gemfs.o \
> > + i915_globals.o \
> > i915_query.o \
> > i915_request.o \
> > i915_scheduler.o \
> > diff --git a/drivers/gpu/drm/i915/i915_active.c
> > b/drivers/gpu/drm/i915/i915_active.c
> > index 215b6ff8aa73..9026787ebdf8 100644
> > --- a/drivers/gpu/drm/i915/i915_active.c
> > +++ b/drivers/gpu/drm/i915/i915_active.c
> > @@ -280,7 +280,12 @@ int __init i915_global_active_init(void)
> > return 0;
> > }
> >
> > -void __exit i915_global_active_exit(void)
> > +void i915_global_active_shrink(void)
> > +{
> > + kmem_cache_shrink(global.slab_cache);
> > +}
> > +
> > +void i915_global_active_exit(void)
> > {
> > kmem_cache_destroy(global.slab_cache);
> > }
> > diff --git a/drivers/gpu/drm/i915/i915_active.h
> > b/drivers/gpu/drm/i915/i915_active.h
> > index 12b5c1d287d1..5fbd9102384b 100644
> > --- a/drivers/gpu/drm/i915/i915_active.h
> > +++ b/drivers/gpu/drm/i915/i915_active.h
> > @@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active
> > *ref) { }
> > #endif
> >
> > int i915_global_active_init(void);
> > +void i915_global_active_shrink(void);
> > void i915_global_active_exit(void);
> >
> > #endif /* _I915_ACTIVE_H_ */
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 37230ae7fbe6..a365b1a2ea9a 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1459,9 +1459,6 @@ struct drm_i915_private {
> > struct kmem_cache *objects;
> > struct kmem_cache *vmas;
> > struct kmem_cache *luts;
> > - struct kmem_cache *requests;
> > - struct kmem_cache *dependencies;
> > - struct kmem_cache *priorities;
> >
> > const struct intel_device_info __info; /* Use INTEL_INFO() to access.
> > */
> > struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access.
> > */
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 1eb3a5f8654c..d18c4ccff370 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -42,6 +42,7 @@
> > #include "i915_drv.h"
> > #include "i915_gem_clflush.h"
> > #include "i915_gemfs.h"
> > +#include "i915_globals.h"
> > #include "i915_reset.h"
> > #include "i915_trace.h"
> > #include "i915_vgpu.h"
> > @@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
> > if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
> > i915->gt.epoch = 1;
> >
> > + i915_globals_unpark();
> > +
> > intel_enable_gt_powersave(i915);
> > i915_update_gfx_val(i915);
> > if (INTEL_GEN(i915) >= 6)
> > @@ -2916,12 +2919,11 @@ static void shrink_caches(struct drm_i915_private
> > *i915)
> > * filled slabs to prioritise allocating from the mostly full slabs,
> > * with the aim of reducing fragmentation.
> > */
> > - kmem_cache_shrink(i915->priorities);
> > - kmem_cache_shrink(i915->dependencies);
> > - kmem_cache_shrink(i915->requests);
> > kmem_cache_shrink(i915->luts);
> > kmem_cache_shrink(i915->vmas);
> > kmem_cache_shrink(i915->objects);
> > +
> > + i915_globals_park();
>
> Slightly confusing that the shrink caches path calls globals_park - ie
> after the device has been parked. Would i915_globals_shrink and
> __i915_globals_shrink be clearer? Not sure.
Final destination is __i915_gem_park. I could stick it there now, but
felt it clearer to have it as a sideways move atm.
With the last 3 slab caches converted over to globals, they all sit
behind the same rcu_work and we can remove our open-coded variant
(rcu_work is a recent invention).
> > +void i915_globals_park(void)
> > +{
> > + struct park_work *wrk;
> > +
> > + /*
> > + * Defer shrinking the global slab caches (and other work) until
> > + * after a RCU grace period has completed with no activity. This
> > + * is to try and reduce the latency impact on the consumers caused
> > + * by us shrinking the caches the same time as they are trying to
> > + * allocate, with the assumption being that if we idle long enough
> > + * for an RCU grace period to elapse since the last use, it is likely
> > + * to be longer until we need the caches again.
> > + */
> > + if (!atomic_dec_and_test(&active))
> > + return;
> > +
> > + wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
> > + if (!wrk)
> > + return;
> > +
> > + wrk->epoch = atomic_inc_return(&epoch);
>
> Do you need to bump the epoch here?
Strictly, no. It doesn't harm, provides an explicit mb and a known
uniqueness to our sampling.
> Unpark would bump it so
> automatically when rcu work gets to run it would fail already. Like this
> it sounds like double increment. I don't see a problem with the double
> increment I just failed to spot if it is actually needed for some subtle
> reason. There would be a potential race with multiple device park
> callers storing the same epoch but is that really a problem? Again, as
> soon as someone unparks it seems like it would be the right thing.
I did wonder if we could make use of it, but for the moment, all I can
say is that it may make debugging slightly easier.
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx