On Fri, Apr 01, 2016 at 03:30:17PM +0300, Vladimir Davydov wrote: > When we call __kmem_cache_shrink on memory cgroup removal, we need to > synchronize kmem_cache->cpu_partial update with put_cpu_partial that > might be running on other cpus. Currently, we achieve that by using > kick_all_cpus_sync, which works as a system wide memory barrier. Though > fast it is, this method has a flow - it issues a lot of IPIs, which > might hurt high performance or real-time workloads. > > To fix this, let's replace kick_all_cpus_sync with synchronize_sched. > Although the latter one may take much longer to finish, it shouldn't be > a problem in this particular case, because memory cgroups are destroyed > asynchronously from a workqueue so that no user visible effects should > be introduced. OTOH, it will save us from excessive IPIs when someone > removes a cgroup. > > Anyway, even if using synchronize_sched turns out to take too long, we > can always introduce a kind of __kmem_cache_shrink batching so that this > method would only be called once per one cgroup destruction (not per > each per memcg kmem cache as it is now). > > Reported-and-suggested-by: Peter Zijlstra <[email protected]> > Signed-off-by: Vladimir Davydov <[email protected]>
Thanks! Acked-by: Peter Zijlstra (Intel) <[email protected]>

