Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-10-02 Thread Yang Shi



On 10/2/17 4:20 AM, Michal Hocko wrote:

On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 —> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


yes we are
  

We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?


using the trylock sounds like a reasonable compromise.


OK, it sounds we reach agreement on trylock. Will solve those comments 
in v9.


Thanks,
Yang





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-10-02 Thread Yang Shi



On 10/2/17 4:20 AM, Michal Hocko wrote:

On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 —> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


yes we are
  

We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?


using the trylock sounds like a reasonable compromise.


OK, it sounds we reach agreement on trylock. Will solve those comments 
in v9.


Thanks,
Yang





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-10-02 Thread Yang Shi



On 9/30/17 4:00 AM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/28/17 1:45 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/28/17 12:57 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 -> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both
regular and panic path.


Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.


I got your point. slab_mutex is used to protect the list of all the
slabs, since we are already in oom, there should be not kmem cache
destroy happen during the list traverse. And, list_for_each_entry() has
been replaced to list_for_each_entry_safe() to make the traverse more
robust.


I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
safe.


I'm not sure if rcu could satisfy this case. rcu just can protect
slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU
slabs.


I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

   Upon registration:

 // do initialize/setup stuff here
 synchronize_rcu(); // <= for dump_unreclaimable_slab()
 list_add_rcu(_cache->list, _caches);

   Upon unregistration:

 list_del_rcu(_cache->list);
 synchronize_rcu(); // <= for dump_unreclaimable_slab()
 // do finalize/cleanup stuff here

then (if my understanding is correct)

rcu_read_lock();
list_for_each_entry_rcu(s, _caches, list) {
if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
continue;

memset(, 0, sizeof(sinfo));
get_slabinfo(s, );

if (sinfo.num_objs > 0)
pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
(sinfo.active_objs * s->size) / 1024,
(sinfo.num_objs * s->size) / 1024);
}
rcu_read_unlock();

will make dump_unreclaimable_slab() safe.


Thanks for the detailed description. However, it sounds this change is  
too much for slub, I'm not sure if this may change the subtle behavior  
of slub.


trylock sounds like a good alternative.

Yang





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-10-02 Thread Yang Shi



On 9/30/17 4:00 AM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/28/17 1:45 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/28/17 12:57 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 -> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both
regular and panic path.


Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.


I got your point. slab_mutex is used to protect the list of all the
slabs, since we are already in oom, there should be not kmem cache
destroy happen during the list traverse. And, list_for_each_entry() has
been replaced to list_for_each_entry_safe() to make the traverse more
robust.


I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
safe.


I'm not sure if rcu could satisfy this case. rcu just can protect
slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU
slabs.


I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

   Upon registration:

 // do initialize/setup stuff here
 synchronize_rcu(); // <= for dump_unreclaimable_slab()
 list_add_rcu(_cache->list, _caches);

   Upon unregistration:

 list_del_rcu(_cache->list);
 synchronize_rcu(); // <= for dump_unreclaimable_slab()
 // do finalize/cleanup stuff here

then (if my understanding is correct)

rcu_read_lock();
list_for_each_entry_rcu(s, _caches, list) {
if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
continue;

memset(, 0, sizeof(sinfo));
get_slabinfo(s, );

if (sinfo.num_objs > 0)
pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
(sinfo.active_objs * s->size) / 1024,
(sinfo.num_objs * s->size) / 1024);
}
rcu_read_unlock();

will make dump_unreclaimable_slab() safe.


Thanks for the detailed description. However, it sounds this change is  
too much for slub, I'm not sure if this may change the subtle behavior  
of slub.


trylock sounds like a good alternative.

Yang





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-10-02 Thread Michal Hocko
On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:
> On 2017/09/28 6:46, Yang Shi wrote:
> > Changelog v7 —> v8:
> > * Adopted Michal’s suggestion to dump unreclaim slab info when 
> > unreclaimable slabs amount > total user memory. Not only in oom panic path.
> 
> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> because there are
> 
>   mutex_lock(_mutex);
>   kmalloc(GFP_KERNEL);
>   mutex_unlock(_mutex);
> 
> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> introducing a risk of crash (i.e. kernel panic) for regular OOM path?

yes we are
 
> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> But it is still remaining unsafe, isn't it?

using the trylock sounds like a reasonable compromise.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-10-02 Thread Michal Hocko
On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:
> On 2017/09/28 6:46, Yang Shi wrote:
> > Changelog v7 —> v8:
> > * Adopted Michal’s suggestion to dump unreclaim slab info when 
> > unreclaimable slabs amount > total user memory. Not only in oom panic path.
> 
> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> because there are
> 
>   mutex_lock(_mutex);
>   kmalloc(GFP_KERNEL);
>   mutex_unlock(_mutex);
> 
> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> introducing a risk of crash (i.e. kernel panic) for regular OOM path?

yes we are
 
> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> But it is still remaining unsafe, isn't it?

using the trylock sounds like a reasonable compromise.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-30 Thread Tetsuo Handa
Yang Shi wrote:
> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> >>> Yang Shi wrote:
>  On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when 
> >> unreclaimable slabs amount > total user memory. Not only in oom panic 
> >> path.
> >
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since 
> > V2
> > because there are
> >
> > mutex_lock(_mutex);
> > kmalloc(GFP_KERNEL);
> > mutex_unlock(_mutex);
> >
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, 
> > aren't we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
>  I don't see the difference between regular oom path and oom path other
>  than calling panic() at last.
> 
>  And, the slab dump may be called by panic path too, it is for both
>  regular and panic path.
> >>>
> >>> Calling a function that might cause kerneloops immediately before calling 
> >>> panic()
> >>> would be tolerable, for the kernel will panic after all. But calling a 
> >>> function
> >>> that might cause kerneloops when there is no plan to call panic() is a 
> >>> bug.
> >>
> >> I got your point. slab_mutex is used to protect the list of all the
> >> slabs, since we are already in oom, there should be not kmem cache
> >> destroy happen during the list traverse. And, list_for_each_entry() has
> >> been replaced to list_for_each_entry_safe() to make the traverse more
> >> robust.
> > 
> > I consider that OOM event and kmem chache destroy event can run concurrently
> > because slab_mutex is not held by OOM event (and unfortunately cannot be 
> > held
> > due to possibility of deadlock) in order to protect the list of all the 
> > slabs.
> > 
> > I don't think replacing list_for_each_entry() with 
> > list_for_each_entry_safe()
> > makes the traverse more robust, for list_for_each_entry_safe() does not 
> > defer
> > freeing of memory used by list element. Rather, replacing 
> > list_for_each_entry()
> > with list_for_each_entry_rcu() (and making relevant changes such as
> > rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
> > safe.
> 
> I'm not sure if rcu could satisfy this case. rcu just can protect  
> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
> slabs.

I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

  Upon registration:

// do initialize/setup stuff here
synchronize_rcu(); // <= for dump_unreclaimable_slab()
list_add_rcu(_cache->list, _caches);

  Upon unregistration:

list_del_rcu(_cache->list);
synchronize_rcu(); // <= for dump_unreclaimable_slab()
// do finalize/cleanup stuff here

then (if my understanding is correct)

rcu_read_lock();
list_for_each_entry_rcu(s, _caches, list) {
if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
continue;

memset(, 0, sizeof(sinfo));
get_slabinfo(s, );

if (sinfo.num_objs > 0)
pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
(sinfo.active_objs * s->size) / 1024,
(sinfo.num_objs * s->size) / 1024);
}
rcu_read_unlock();

will make dump_unreclaimable_slab() safe.


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-30 Thread Tetsuo Handa
Yang Shi wrote:
> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> >>> Yang Shi wrote:
>  On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when 
> >> unreclaimable slabs amount > total user memory. Not only in oom panic 
> >> path.
> >
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since 
> > V2
> > because there are
> >
> > mutex_lock(_mutex);
> > kmalloc(GFP_KERNEL);
> > mutex_unlock(_mutex);
> >
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, 
> > aren't we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
>  I don't see the difference between regular oom path and oom path other
>  than calling panic() at last.
> 
>  And, the slab dump may be called by panic path too, it is for both
>  regular and panic path.
> >>>
> >>> Calling a function that might cause kerneloops immediately before calling 
> >>> panic()
> >>> would be tolerable, for the kernel will panic after all. But calling a 
> >>> function
> >>> that might cause kerneloops when there is no plan to call panic() is a 
> >>> bug.
> >>
> >> I got your point. slab_mutex is used to protect the list of all the
> >> slabs, since we are already in oom, there should be not kmem cache
> >> destroy happen during the list traverse. And, list_for_each_entry() has
> >> been replaced to list_for_each_entry_safe() to make the traverse more
> >> robust.
> > 
> > I consider that OOM event and kmem chache destroy event can run concurrently
> > because slab_mutex is not held by OOM event (and unfortunately cannot be 
> > held
> > due to possibility of deadlock) in order to protect the list of all the 
> > slabs.
> > 
> > I don't think replacing list_for_each_entry() with 
> > list_for_each_entry_safe()
> > makes the traverse more robust, for list_for_each_entry_safe() does not 
> > defer
> > freeing of memory used by list element. Rather, replacing 
> > list_for_each_entry()
> > with list_for_each_entry_rcu() (and making relevant changes such as
> > rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
> > safe.
> 
> I'm not sure if rcu could satisfy this case. rcu just can protect  
> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
> slabs.

I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

  Upon registration:

// do initialize/setup stuff here
synchronize_rcu(); // <= for dump_unreclaimable_slab()
list_add_rcu(_cache->list, _caches);

  Upon unregistration:

list_del_rcu(_cache->list);
synchronize_rcu(); // <= for dump_unreclaimable_slab()
// do finalize/cleanup stuff here

then (if my understanding is correct)

rcu_read_lock();
list_for_each_entry_rcu(s, _caches, list) {
if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
continue;

memset(, 0, sizeof(sinfo));
get_slabinfo(s, );

if (sinfo.num_objs > 0)
pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
(sinfo.active_objs * s->size) / 1024,
(sinfo.num_objs * s->size) / 1024);
}
rcu_read_unlock();

will make dump_unreclaimable_slab() safe.


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-29 Thread Yang Shi



On 9/28/17 1:45 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/28/17 12:57 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 -> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both
regular and panic path.


Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.


I got your point. slab_mutex is used to protect the list of all the
slabs, since we are already in oom, there should be not kmem cache
destroy happen during the list traverse. And, list_for_each_entry() has
been replaced to list_for_each_entry_safe() to make the traverse more
robust.


I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
safe.


I'm not sure if rcu could satisfy this case. rcu just can protect  
slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
slabs.


Yang





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-29 Thread Yang Shi



On 9/28/17 1:45 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/28/17 12:57 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 -> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both
regular and panic path.


Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.


I got your point. slab_mutex is used to protect the list of all the
slabs, since we are already in oom, there should be not kmem cache
destroy happen during the list traverse. And, list_for_each_entry() has
been replaced to list_for_each_entry_safe() to make the traverse more
robust.


I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
safe.


I'm not sure if rcu could satisfy this case. rcu just can protect  
slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
slabs.


Yang





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Tetsuo Handa
Yang Shi wrote:
> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> >>> On 2017/09/28 6:46, Yang Shi wrote:
>  Changelog v7 -> v8:
>  * Adopted Michal’s suggestion to dump unreclaim slab info when 
>  unreclaimable slabs amount > total user memory. Not only in oom panic 
>  path.
> >>>
> >>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> >>> because there are
> >>>
> >>>   mutex_lock(_mutex);
> >>>   kmalloc(GFP_KERNEL);
> >>>   mutex_unlock(_mutex);
> >>>
> >>> users. If we call dump_unreclaimable_slab() for non OOM panic path, 
> >>> aren't we
> >>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> >>
> >> I don't see the difference between regular oom path and oom path other
> >> than calling panic() at last.
> >>
> >> And, the slab dump may be called by panic path too, it is for both
> >> regular and panic path.
> > 
> > Calling a function that might cause kerneloops immediately before calling 
> > panic()
> > would be tolerable, for the kernel will panic after all. But calling a 
> > function
> > that might cause kerneloops when there is no plan to call panic() is a bug.
> 
> I got your point. slab_mutex is used to protect the list of all the  
> slabs, since we are already in oom, there should be not kmem cache  
> destroy happen during the list traverse. And, list_for_each_entry() has  
> been replaced to list_for_each_entry_safe() to make the traverse more  
> robust.

I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
safe.


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Tetsuo Handa
Yang Shi wrote:
> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> >>> On 2017/09/28 6:46, Yang Shi wrote:
>  Changelog v7 -> v8:
>  * Adopted Michal’s suggestion to dump unreclaim slab info when 
>  unreclaimable slabs amount > total user memory. Not only in oom panic 
>  path.
> >>>
> >>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> >>> because there are
> >>>
> >>>   mutex_lock(_mutex);
> >>>   kmalloc(GFP_KERNEL);
> >>>   mutex_unlock(_mutex);
> >>>
> >>> users. If we call dump_unreclaimable_slab() for non OOM panic path, 
> >>> aren't we
> >>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> >>
> >> I don't see the difference between regular oom path and oom path other
> >> than calling panic() at last.
> >>
> >> And, the slab dump may be called by panic path too, it is for both
> >> regular and panic path.
> > 
> > Calling a function that might cause kerneloops immediately before calling 
> > panic()
> > would be tolerable, for the kernel will panic after all. But calling a 
> > function
> > that might cause kerneloops when there is no plan to call panic() is a bug.
> 
> I got your point. slab_mutex is used to protect the list of all the  
> slabs, since we are already in oom, there should be not kmem cache  
> destroy happen during the list traverse. And, list_for_each_entry() has  
> been replaced to list_for_each_entry_safe() to make the traverse more  
> robust.

I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse 
safe.


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Yang Shi



On 9/28/17 12:57 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 -> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both
regular and panic path.


Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.


I got your point. slab_mutex is used to protect the list of all the  
slabs, since we are already in oom, there should be not kmem cache  
destroy happen during the list traverse. And, list_for_each_entry() has  
been replaced to list_for_each_entry_safe() to make the traverse more  
robust.


Thanks,
Yang





Thanks,
Yang



We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Yang Shi



On 9/28/17 12:57 PM, Tetsuo Handa wrote:

Yang Shi wrote:

On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 -> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both
regular and panic path.


Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.


I got your point. slab_mutex is used to protect the list of all the  
slabs, since we are already in oom, there should be not kmem cache  
destroy happen during the list traverse. And, list_for_each_entry() has  
been replaced to list_for_each_entry_safe() to make the traverse more  
robust.


Thanks,
Yang





Thanks,
Yang



We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?





Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Tetsuo Handa
Yang Shi wrote:
> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when 
> >> unreclaimable slabs amount > total user memory. Not only in oom panic path.
> > 
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> > because there are
> > 
> > mutex_lock(_mutex);
> > kmalloc(GFP_KERNEL);
> > mutex_unlock(_mutex);
> > 
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't 
> > we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
> I don't see the difference between regular oom path and oom path other 
> than calling panic() at last.
> 
> And, the slab dump may be called by panic path too, it is for both 
> regular and panic path.

Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.

> 
> Thanks,
> Yang
> 
> > 
> > We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> > But it is still remaining unsafe, isn't it?
> > 
> 


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Tetsuo Handa
Yang Shi wrote:
> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when 
> >> unreclaimable slabs amount > total user memory. Not only in oom panic path.
> > 
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> > because there are
> > 
> > mutex_lock(_mutex);
> > kmalloc(GFP_KERNEL);
> > mutex_unlock(_mutex);
> > 
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't 
> > we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
> I don't see the difference between regular oom path and oom path other 
> than calling panic() at last.
> 
> And, the slab dump may be called by panic path too, it is for both 
> regular and panic path.

Calling a function that might cause kerneloops immediately before calling 
panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.

> 
> Thanks,
> Yang
> 
> > 
> > We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> > But it is still remaining unsafe, isn't it?
> > 
> 


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Yang Shi



On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 —> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other 
than calling panic() at last.


And, the slab dump may be called by panic path too, it is for both 
regular and panic path.


Thanks,
Yang



We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?



Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-28 Thread Yang Shi



On 9/27/17 9:36 PM, Tetsuo Handa wrote:

On 2017/09/28 6:46, Yang Shi wrote:

Changelog v7 —> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs 
amount > total user memory. Not only in oom panic path.


Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?


I don't see the difference between regular oom path and oom path other 
than calling panic() at last.


And, the slab dump may be called by panic path too, it is for both 
regular and panic path.


Thanks,
Yang



We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?



Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-27 Thread Tetsuo Handa
On 2017/09/28 6:46, Yang Shi wrote:
> Changelog v7 —> v8:
> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable 
> slabs amount > total user memory. Not only in oom panic path.

Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?

We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?


Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message

2017-09-27 Thread Tetsuo Handa
On 2017/09/28 6:46, Yang Shi wrote:
> Changelog v7 —> v8:
> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable 
> slabs amount > total user memory. Not only in oom panic path.

Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

mutex_lock(_mutex);
kmalloc(GFP_KERNEL);
mutex_unlock(_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?

We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?