Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-10 Thread Paul E. McKenney
On Sun, Jun 10, 2018 at 10:40:17AM -0700, Shakeel Butt wrote:
> On Sun, Jun 10, 2018 at 9:32 AM Paul E. McKenney
>  wrote:
> >
> > On Sun, Jun 10, 2018 at 07:52:50AM -0700, Shakeel Butt wrote:
> > > On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov  
> > > wrote:
> > > >
> > > > On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote:
> > > > > The memcg kmem cache creation and deactivation (SLUB only) is
> > > > > asynchronous. If a root kmem cache is destroyed whose memcg cache is 
> > > > > in
> > > > > the process of creation or deactivation, the kernel may crash.
> > > > >
> > > > > Example of one such crash:
> > > > >   general protection fault:  [#1] SMP PTI
> > > > >   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
> > > > >   ...
> > > > >   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
> > > > >   RIP: 0010:has_cpu_slab
> > > > >   ...
> > > > >   Call Trace:
> > > > >   ? on_each_cpu_cond
> > > > >   __kmem_cache_shrink
> > > > >   kmemcg_cache_deact_after_rcu
> > > > >   kmemcg_deactivate_workfn
> > > > >   process_one_work
> > > > >   worker_thread
> > > > >   kthread
> > > > >   ret_from_fork+0x35/0x40
> > > > >
> > > > > To fix this race, on root kmem cache destruction, mark the cache as
> > > > > dying and flush the workqueue used for memcg kmem cache creation and
> > > > > deactivation.
> > > >
> > > > > @@ -845,6 +862,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > > > >   if (unlikely(!s))
> > > > >   return;
> > > > >
> > > > > + flush_memcg_workqueue(s);
> > > > > +
> > > >
> > > > This should definitely help against async memcg_kmem_cache_create(),
> > > > but I'm afraid it doesn't eliminate the race with async destruction,
> > > > unfortunately, because the latter uses call_rcu_sched():
> > > >
> > > >   memcg_deactivate_kmem_caches
> > > >__kmem_cache_deactivate
> > > > slab_deactivate_memcg_cache_rcu_sched
> > > >  call_rcu_sched
> > > > kmem_cache_destroy
> > > >  shutdown_memcg_caches
> > > >   shutdown_cache
> > > >   memcg_deactivate_rcufn
> > > >
> > > >
> > > > Can we somehow flush those pending rcu requests?
> > >
> > > You are right and thanks for catching that. Now I am wondering if
> > > synchronize_sched() just before flush_workqueue() should be enough.
> > > Otherwise we might have to replace call_sched_rcu with
> > > synchronize_sched() in kmemcg_deactivate_workfn which I would not
> > > prefer as that would holdup the kmem_cache workqueue.
> > >
> > > +Paul
> > >
> > > Paul, we have a situation something similar to the following pseudo code.
> > >
> > > CPU0:
> > > lock(l)
> > > if (!flag)
> > >   call_rcu_sched(callback);
> > > unlock(l)
> > > --
> > > CPU1:
> > > lock(l)
> > > flag = true
> > > unlock(l)
> > > synchronize_sched()
> > > --
> > >
> > > If CPU0 has called already called call_rchu_sched(callback) then later
> > > if CPU1 calls synchronize_sched(). Is there any guarantee that on
> > > return from synchronize_sched(), the rcu callback scheduled by CPU0
> > > has already been executed?
> >
> > No.  There is no such guarantee.
> >
> > You instead want rcu_barrier_sched(), which waits for the callbacks from
> > all prior invocations of call_rcu_sched() to be invoked.
> >
> > Please note that synchronize_sched() is -not- sufficient.  It is only
> > guaranteed to wait for a grace period, not necessarily for all prior
> > callbacks.  This goes both directions because if there are no callbacks
> > in the system, then rcu_barrier_sched() is within its rights to return
> > immediately.
> >
> > So please make sure you use each of synchronize_sched() and
> > rcu_barrier_sched() to do the job that it was intended to do!  ;-)
> >
> > If your lock(l) is shorthand for spin_lock(&l), it looks to me like you
> > actually only need rcu_barrier_sched():
> >
> > CPU0:
> > spin_lock(&l);
> > if (!flag)
> >   call_rcu_sched(callback);
> > spin_unlock(&l);
> >
> > CPU1:
> > spin_lock(&l);
> > flag = true;
> > spin_unlock(&l);
> > /* At this point, no more callbacks will be registered. */
> > rcu_barrier_sched();
> > /* At this point, all registered callbacks will have been invoked. 
> > */
> >
> > On the other hand, if your "lock(l)" was instead shorthand for
> > rcu_read_lock_sched(), then you need -both- synchronize_sched() -and-
> > rcu_barrier().  And even then, you will be broken in -rt kernels.
> > (Which might or might not be a concern, depending on whether your code
> > matters to -rt kernels.
> >
> > Make sense?
> 
> Thanks a lot, that was really helpful. The lock is actually
> mutex_lock. So, I think rcu_barrier_sched() should be sufficient.

Yes, with either spin_lock() or mutex_lock(), this should wo

Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-10 Thread Shakeel Butt
On Sun, Jun 10, 2018 at 9:32 AM Paul E. McKenney
 wrote:
>
> On Sun, Jun 10, 2018 at 07:52:50AM -0700, Shakeel Butt wrote:
> > On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov  
> > wrote:
> > >
> > > On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote:
> > > > The memcg kmem cache creation and deactivation (SLUB only) is
> > > > asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> > > > the process of creation or deactivation, the kernel may crash.
> > > >
> > > > Example of one such crash:
> > > >   general protection fault:  [#1] SMP PTI
> > > >   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
> > > >   ...
> > > >   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
> > > >   RIP: 0010:has_cpu_slab
> > > >   ...
> > > >   Call Trace:
> > > >   ? on_each_cpu_cond
> > > >   __kmem_cache_shrink
> > > >   kmemcg_cache_deact_after_rcu
> > > >   kmemcg_deactivate_workfn
> > > >   process_one_work
> > > >   worker_thread
> > > >   kthread
> > > >   ret_from_fork+0x35/0x40
> > > >
> > > > To fix this race, on root kmem cache destruction, mark the cache as
> > > > dying and flush the workqueue used for memcg kmem cache creation and
> > > > deactivation.
> > >
> > > > @@ -845,6 +862,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > > >   if (unlikely(!s))
> > > >   return;
> > > >
> > > > + flush_memcg_workqueue(s);
> > > > +
> > >
> > > This should definitely help against async memcg_kmem_cache_create(),
> > > but I'm afraid it doesn't eliminate the race with async destruction,
> > > unfortunately, because the latter uses call_rcu_sched():
> > >
> > >   memcg_deactivate_kmem_caches
> > >__kmem_cache_deactivate
> > > slab_deactivate_memcg_cache_rcu_sched
> > >  call_rcu_sched
> > > kmem_cache_destroy
> > >  shutdown_memcg_caches
> > >   shutdown_cache
> > >   memcg_deactivate_rcufn
> > >
> > >
> > > Can we somehow flush those pending rcu requests?
> >
> > You are right and thanks for catching that. Now I am wondering if
> > synchronize_sched() just before flush_workqueue() should be enough.
> > Otherwise we might have to replace call_sched_rcu with
> > synchronize_sched() in kmemcg_deactivate_workfn which I would not
> > prefer as that would holdup the kmem_cache workqueue.
> >
> > +Paul
> >
> > Paul, we have a situation something similar to the following pseudo code.
> >
> > CPU0:
> > lock(l)
> > if (!flag)
> >   call_rcu_sched(callback);
> > unlock(l)
> > --
> > CPU1:
> > lock(l)
> > flag = true
> > unlock(l)
> > synchronize_sched()
> > --
> >
> > If CPU0 has called already called call_rchu_sched(callback) then later
> > if CPU1 calls synchronize_sched(). Is there any guarantee that on
> > return from synchronize_sched(), the rcu callback scheduled by CPU0
> > has already been executed?
>
> No.  There is no such guarantee.
>
> You instead want rcu_barrier_sched(), which waits for the callbacks from
> all prior invocations of call_rcu_sched() to be invoked.
>
> Please note that synchronize_sched() is -not- sufficient.  It is only
> guaranteed to wait for a grace period, not necessarily for all prior
> callbacks.  This goes both directions because if there are no callbacks
> in the system, then rcu_barrier_sched() is within its rights to return
> immediately.
>
> So please make sure you use each of synchronize_sched() and
> rcu_barrier_sched() to do the job that it was intended to do!  ;-)
>
> If your lock(l) is shorthand for spin_lock(&l), it looks to me like you
> actually only need rcu_barrier_sched():
>
> CPU0:
> spin_lock(&l);
> if (!flag)
>   call_rcu_sched(callback);
> spin_unlock(&l);
>
> CPU1:
> spin_lock(&l);
> flag = true;
> spin_unlock(&l);
> /* At this point, no more callbacks will be registered. */
> rcu_barrier_sched();
> /* At this point, all registered callbacks will have been invoked. */
>
> On the other hand, if your "lock(l)" was instead shorthand for
> rcu_read_lock_sched(), then you need -both- synchronize_sched() -and-
> rcu_barrier().  And even then, you will be broken in -rt kernels.
> (Which might or might not be a concern, depending on whether your code
> matters to -rt kernels.
>
> Make sense?
>

Thanks a lot, that was really helpful. The lock is actually
mutex_lock. So, I think rcu_barrier_sched() should be sufficient.

Shakeel


Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-10 Thread Paul E. McKenney
On Sun, Jun 10, 2018 at 07:52:50AM -0700, Shakeel Butt wrote:
> On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov  
> wrote:
> >
> > On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote:
> > > The memcg kmem cache creation and deactivation (SLUB only) is
> > > asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> > > the process of creation or deactivation, the kernel may crash.
> > >
> > > Example of one such crash:
> > >   general protection fault:  [#1] SMP PTI
> > >   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
> > >   ...
> > >   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
> > >   RIP: 0010:has_cpu_slab
> > >   ...
> > >   Call Trace:
> > >   ? on_each_cpu_cond
> > >   __kmem_cache_shrink
> > >   kmemcg_cache_deact_after_rcu
> > >   kmemcg_deactivate_workfn
> > >   process_one_work
> > >   worker_thread
> > >   kthread
> > >   ret_from_fork+0x35/0x40
> > >
> > > To fix this race, on root kmem cache destruction, mark the cache as
> > > dying and flush the workqueue used for memcg kmem cache creation and
> > > deactivation.
> >
> > > @@ -845,6 +862,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > >   if (unlikely(!s))
> > >   return;
> > >
> > > + flush_memcg_workqueue(s);
> > > +
> >
> > This should definitely help against async memcg_kmem_cache_create(),
> > but I'm afraid it doesn't eliminate the race with async destruction,
> > unfortunately, because the latter uses call_rcu_sched():
> >
> >   memcg_deactivate_kmem_caches
> >__kmem_cache_deactivate
> > slab_deactivate_memcg_cache_rcu_sched
> >  call_rcu_sched
> > kmem_cache_destroy
> >  shutdown_memcg_caches
> >   shutdown_cache
> >   memcg_deactivate_rcufn
> >
> >
> > Can we somehow flush those pending rcu requests?
> 
> You are right and thanks for catching that. Now I am wondering if
> synchronize_sched() just before flush_workqueue() should be enough.
> Otherwise we might have to replace call_sched_rcu with
> synchronize_sched() in kmemcg_deactivate_workfn which I would not
> prefer as that would holdup the kmem_cache workqueue.
> 
> +Paul
> 
> Paul, we have a situation something similar to the following pseudo code.
> 
> CPU0:
> lock(l)
> if (!flag)
>   call_rcu_sched(callback);
> unlock(l)
> --
> CPU1:
> lock(l)
> flag = true
> unlock(l)
> synchronize_sched()
> --
> 
> If CPU0 has called already called call_rchu_sched(callback) then later
> if CPU1 calls synchronize_sched(). Is there any guarantee that on
> return from synchronize_sched(), the rcu callback scheduled by CPU0
> has already been executed?

No.  There is no such guarantee.

You instead want rcu_barrier_sched(), which waits for the callbacks from
all prior invocations of call_rcu_sched() to be invoked.

Please note that synchronize_sched() is -not- sufficient.  It is only
guaranteed to wait for a grace period, not necessarily for all prior
callbacks.  This goes both directions because if there are no callbacks
in the system, then rcu_barrier_sched() is within its rights to return
immediately.

So please make sure you use each of synchronize_sched() and
rcu_barrier_sched() to do the job that it was intended to do!  ;-)

If your lock(l) is shorthand for spin_lock(&l), it looks to me like you
actually only need rcu_barrier_sched():

CPU0:
spin_lock(&l);
if (!flag)
  call_rcu_sched(callback);
spin_unlock(&l);

CPU1:
spin_lock(&l);
flag = true;
spin_unlock(&l);
/* At this point, no more callbacks will be registered. */
rcu_barrier_sched();
/* At this point, all registered callbacks will have been invoked. */

On the other hand, if your "lock(l)" was instead shorthand for
rcu_read_lock_sched(), then you need -both- synchronize_sched() -and-
rcu_barrier().  And even then, you will be broken in -rt kernels.
(Which might or might not be a concern, depending on whether your code
matters to -rt kernels.

Make sense?

Thanx, Paul



Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-10 Thread Shakeel Butt
On Sat, Jun 9, 2018 at 3:20 AM Vladimir Davydov  wrote:
>
> On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote:
> > The memcg kmem cache creation and deactivation (SLUB only) is
> > asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> > the process of creation or deactivation, the kernel may crash.
> >
> > Example of one such crash:
> >   general protection fault:  [#1] SMP PTI
> >   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
> >   ...
> >   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
> >   RIP: 0010:has_cpu_slab
> >   ...
> >   Call Trace:
> >   ? on_each_cpu_cond
> >   __kmem_cache_shrink
> >   kmemcg_cache_deact_after_rcu
> >   kmemcg_deactivate_workfn
> >   process_one_work
> >   worker_thread
> >   kthread
> >   ret_from_fork+0x35/0x40
> >
> > To fix this race, on root kmem cache destruction, mark the cache as
> > dying and flush the workqueue used for memcg kmem cache creation and
> > deactivation.
>
> > @@ -845,6 +862,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >   if (unlikely(!s))
> >   return;
> >
> > + flush_memcg_workqueue(s);
> > +
>
> This should definitely help against async memcg_kmem_cache_create(),
> but I'm afraid it doesn't eliminate the race with async destruction,
> unfortunately, because the latter uses call_rcu_sched():
>
>   memcg_deactivate_kmem_caches
>__kmem_cache_deactivate
> slab_deactivate_memcg_cache_rcu_sched
>  call_rcu_sched
> kmem_cache_destroy
>  shutdown_memcg_caches
>   shutdown_cache
>   memcg_deactivate_rcufn
>
>
> Can we somehow flush those pending rcu requests?

You are right and thanks for catching that. Now I am wondering if
synchronize_sched() just before flush_workqueue() should be enough.
Otherwise we might have to replace call_sched_rcu with
synchronize_sched() in kmemcg_deactivate_workfn which I would not
prefer as that would holdup the kmem_cache workqueue.

+Paul

Paul, we have a situation something similar to the following pseudo code.

CPU0:
lock(l)
if (!flag)
  call_rcu_sched(callback);
unlock(l)
--
CPU1:
lock(l)
flag = true
unlock(l)
synchronize_sched()
--

If CPU0 has called already called call_rchu_sched(callback) then later
if CPU1 calls synchronize_sched(). Is there any guarantee that on
return from synchronize_sched(), the rcu callback scheduled by CPU0
has already been executed?

thanks,
Shakeel


Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-09 Thread Vladimir Davydov
On Tue, May 29, 2018 at 05:12:04PM -0700, Shakeel Butt wrote:
> The memcg kmem cache creation and deactivation (SLUB only) is
> asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> the process of creation or deactivation, the kernel may crash.
> 
> Example of one such crash:
>   general protection fault:  [#1] SMP PTI
>   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
>   ...
>   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
>   RIP: 0010:has_cpu_slab
>   ...
>   Call Trace:
>   ? on_each_cpu_cond
>   __kmem_cache_shrink
>   kmemcg_cache_deact_after_rcu
>   kmemcg_deactivate_workfn
>   process_one_work
>   worker_thread
>   kthread
>   ret_from_fork+0x35/0x40
> 
> To fix this race, on root kmem cache destruction, mark the cache as
> dying and flush the workqueue used for memcg kmem cache creation and
> deactivation.

> @@ -845,6 +862,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
>   if (unlikely(!s))
>   return;
>  
> + flush_memcg_workqueue(s);
> +

This should definitely help against async memcg_kmem_cache_create(),
but I'm afraid it doesn't eliminate the race with async destruction,
unfortunately, because the latter uses call_rcu_sched():

  memcg_deactivate_kmem_caches
   __kmem_cache_deactivate
slab_deactivate_memcg_cache_rcu_sched
 call_rcu_sched
kmem_cache_destroy
 shutdown_memcg_caches
  shutdown_cache
  memcg_deactivate_rcufn
   

Can we somehow flush those pending rcu requests?


Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-06-08 Thread Andrew Morton
On Tue, 29 May 2018 17:12:04 -0700 Shakeel Butt  wrote:

> The memcg kmem cache creation and deactivation (SLUB only) is
> asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> the process of creation or deactivation, the kernel may crash.
> 
> Example of one such crash:
>   general protection fault:  [#1] SMP PTI
>   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
>   ...
>   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
>   RIP: 0010:has_cpu_slab
>   ...
>   Call Trace:
>   ? on_each_cpu_cond
>   __kmem_cache_shrink
>   kmemcg_cache_deact_after_rcu
>   kmemcg_deactivate_workfn
>   process_one_work
>   worker_thread
>   kthread
>   ret_from_fork+0x35/0x40
> 
> To fix this race, on root kmem cache destruction, mark the cache as
> dying and flush the workqueue used for memcg kmem cache creation and
> deactivation.

We have a distinct lack of reviewers and testers on this one.  Please.

Vladimir, v3 replaced the refcounting with workqueue flushing, as you
suggested...


From: Shakeel Butt 
Subject: mm: fix race between kmem_cache destroy, create and deactivate

The memcg kmem cache creation and deactivation (SLUB only) is
asynchronous.  If a root kmem cache is destroyed whose memcg cache is in
the process of creation or deactivation, the kernel may crash.

Example of one such crash:
general protection fault:  [#1] SMP PTI
CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
...
Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
RIP: 0010:has_cpu_slab
...
Call Trace:
? on_each_cpu_cond
__kmem_cache_shrink
kmemcg_cache_deact_after_rcu
kmemcg_deactivate_workfn
process_one_work
worker_thread
kthread
ret_from_fork+0x35/0x40

To fix this race, on root kmem cache destruction, mark the cache as dying
and flush the workqueue used for memcg kmem cache creation and
deactivation.

[shake...@google.com: add more documentation, rename fields for readability]
  Link: http://lkml.kernel.org/r/20180522201336.196994-1-shake...@google.com
[a...@linux-foundation.org: fix build, per Shakeel]
[shake...@google.com: v3.  Instead of refcount, flush the workqueue]
  Link: http://lkml.kernel.org/r/20180530001204.183758-1-shake...@google.com
Link: http://lkml.kernel.org/r/20180521174116.171846-1-shake...@google.com
Signed-off-by: Shakeel Butt 
Cc: Michal Hocko 
Cc: Greg Thelen 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Johannes Weiner 
Cc: Vladimir Davydov 
Cc: Tejun Heo 
Signed-off-by: Andrew Morton 
---

 include/linux/slab.h |1 +
 mm/slab_common.c |   21 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff -puN 
include/linux/slab_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate
 include/linux/slab_def.h
diff -puN 
include/linux/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate
 include/linux/slab.h
--- 
a/include/linux/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate
+++ a/include/linux/slab.h
@@ -600,6 +600,7 @@ struct memcg_cache_params {
struct memcg_cache_array __rcu *memcg_caches;
struct list_head __root_caches_node;
struct list_head children;
+   bool dying;
};
struct {
struct mem_cgroup *memcg;
diff -puN 
include/linux/slub_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate
 include/linux/slub_def.h
diff -puN 
mm/memcontrol.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate 
mm/memcontrol.c
diff -puN 
mm/slab.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate mm/slab.c
diff -puN 
mm/slab_common.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate 
mm/slab_common.c
--- 
a/mm/slab_common.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate
+++ a/mm/slab_common.c
@@ -136,6 +136,7 @@ void slab_init_memcg_params(struct kmem_
s->memcg_params.root_cache = NULL;
RCU_INIT_POINTER(s->memcg_params.memcg_caches, NULL);
INIT_LIST_HEAD(&s->memcg_params.children);
+   s->memcg_params.dying = false;
 }
 
 static int init_memcg_params(struct kmem_cache *s,
@@ -608,7 +609,7 @@ void memcg_create_kmem_cache(struct mem_
 * The memory cgroup could have been offlined while the cache
 * creation work was pending.
 */
-   if (memcg->kmem_state != KMEM_ONLINE)
+   if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
goto out_unlock;
 
idx = memcg_cache_id(memcg);
@@ -712,6 +713,9 @@ void slab_deactivate_memcg_cache_rcu_sch
WARN_ON_ONCE(s->memcg_params.deact_fn))
return;
 
+   if (s->memcg_params.root_cache->memcg_params.dying)
+   return;
+
  

Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-05-31 Thread Shakeel Butt
On Thu, May 31, 2018 at 5:18 PM, Andrew Morton
 wrote:
> On Tue, 29 May 2018 17:12:04 -0700 Shakeel Butt  wrote:
>
>> The memcg kmem cache creation and deactivation (SLUB only) is
>> asynchronous. If a root kmem cache is destroyed whose memcg cache is in
>> the process of creation or deactivation, the kernel may crash.
>>
>> Example of one such crash:
>>   general protection fault:  [#1] SMP PTI
>>   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
>>   ...
>>   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
>>   RIP: 0010:has_cpu_slab
>>   ...
>>   Call Trace:
>>   ? on_each_cpu_cond
>>   __kmem_cache_shrink
>>   kmemcg_cache_deact_after_rcu
>>   kmemcg_deactivate_workfn
>>   process_one_work
>>   worker_thread
>>   kthread
>>   ret_from_fork+0x35/0x40
>>
>> To fix this race, on root kmem cache destruction, mark the cache as
>> dying and flush the workqueue used for memcg kmem cache creation and
>> deactivation.
>>
>> Signed-off-by: Shakeel Butt 
>> ---
>> Changelog since v2:
>> - Instead of refcount, flush the workqueue
>
> This one-liner doesn't appear to fully describe the difference between
> v2 and v3, which is rather large:
>

Sorry about that, I should have explained more. The reason the diff
between v2 and v3 is large is because v3 is the complete rewrite. So,
the diff is the revert of v2 and then v3 patch. If you drop all the
previous versions and just keep v3, it will be smaller.

thanks,
Shakeel


Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

2018-05-31 Thread Andrew Morton
On Tue, 29 May 2018 17:12:04 -0700 Shakeel Butt  wrote:

> The memcg kmem cache creation and deactivation (SLUB only) is
> asynchronous. If a root kmem cache is destroyed whose memcg cache is in
> the process of creation or deactivation, the kernel may crash.
> 
> Example of one such crash:
>   general protection fault:  [#1] SMP PTI
>   CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
>   ...
>   Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
>   RIP: 0010:has_cpu_slab
>   ...
>   Call Trace:
>   ? on_each_cpu_cond
>   __kmem_cache_shrink
>   kmemcg_cache_deact_after_rcu
>   kmemcg_deactivate_workfn
>   process_one_work
>   worker_thread
>   kthread
>   ret_from_fork+0x35/0x40
> 
> To fix this race, on root kmem cache destruction, mark the cache as
> dying and flush the workqueue used for memcg kmem cache creation and
> deactivation.
> 
> Signed-off-by: Shakeel Butt 
> ---
> Changelog since v2:
> - Instead of refcount, flush the workqueue

This one-liner doesn't appear to fully describe the difference between
v2 and v3, which is rather large:

 include/linux/slab.h |3 
 include/linux/slab_def.h |5 -
 include/linux/slub_def.h |3 
 mm/memcontrol.c  |7 -
 mm/slab.c|4 -
 mm/slab.h|6 -
 mm/slab_common.c |  139 ++---
 mm/slub.c|   14 +--
 8 files changed, 51 insertions(+), 130 deletions(-)

diff -puN 
include/linux/slab_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
 include/linux/slab_def.h
--- 
a/include/linux/slab_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
+++ a/include/linux/slab_def.h
@@ -41,10 +41,7 @@ struct kmem_cache {
 /* 4) cache creation/removal */
const char *name;
struct list_head list;
-   /* Refcount for root kmem caches */
-   refcount_t refcount;
-   /* Number of root kmem caches sharing this cache */
-   int shared_count;
+   int refcount;
int object_size;
int align;
 
diff -puN 
include/linux/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
 include/linux/slab.h
--- 
a/include/linux/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
+++ a/include/linux/slab.h
@@ -599,6 +599,7 @@ struct memcg_cache_params {
struct memcg_cache_array __rcu *memcg_caches;
struct list_head __root_caches_node;
struct list_head children;
+   bool dying;
};
struct {
struct mem_cgroup *memcg;
@@ -615,8 +616,6 @@ struct memcg_cache_params {
 };
 
 int memcg_update_all_caches(int num_memcgs);
-bool kmem_cache_tryget(struct kmem_cache *s);
-void kmem_cache_put(struct kmem_cache *s);
 
 /**
  * kmalloc_array - allocate memory for an array.
diff -puN 
include/linux/slub_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
 include/linux/slub_def.h
--- 
a/include/linux/slub_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
+++ a/include/linux/slub_def.h
@@ -97,8 +97,7 @@ struct kmem_cache {
struct kmem_cache_order_objects max;
struct kmem_cache_order_objects min;
gfp_t allocflags;   /* gfp flags to use on each alloc */
-   refcount_t refcount;/* Refcount for root kmem cache */
-   int shared_count;   /* Number of kmem caches sharing this cache */
+   int refcount;   /* Refcount for slab cache destroy */
void (*ctor)(void *);
unsigned int inuse; /* Offset to metadata */
unsigned int align; /* Alignment */
diff -puN 
mm/memcontrol.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3 
mm/memcontrol.c
--- 
a/mm/memcontrol.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
+++ a/mm/memcontrol.c
@@ -2181,7 +2181,6 @@ static void memcg_kmem_cache_create_func
memcg_create_kmem_cache(memcg, cachep);
 
css_put(&memcg->css);
-   kmem_cache_put(cachep);
kfree(cw);
 }
 
@@ -2197,12 +2196,6 @@ static void __memcg_schedule_kmem_cache_
if (!cw)
return;
 
-   /* Make sure root kmem cache does not get destroyed in the middle */
-   if (!kmem_cache_tryget(cachep)) {
-   kfree(cw);
-   return;
-   }
-
css_get(&memcg->css);
 
cw->memcg = memcg;
diff -puN 
mm/slab.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3 
mm/slab.c
--- a/mm/slab.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate-v3
+++ a/mm/slab.c
@@ -1883,8 +1883,8 @@ __kmem_cache_alias(const char *name, uns
struct kmem_cache *cachep;
 
cachep = find_mergeable(size, align, flags, name, ctor);
-   if (cachep && kmem_cache_tryget(cachep)) {
-   cachep->shared_count++;
+