Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Eric W. Biederman
Kirill Tkhai  writes:

> On 01.06.2018 18:25, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>>> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
 Michal Hocko  writes:
>>> [...]
> Group leader exiting early without tearing down the whole thread
> group should be quite rare as well. No question that somebody might do
> that on purpose though...

 The group leader exiting early is a completely legitimate and reasonable
 thing to do, even if it is rare.
>>>
>>> I am not saying it isn't legitimate. But the most common case is the
>>> main thread waiting for its threads or calling exit which would tear the
>>> whole group down. Is there any easy way to achieve this other than tkill
>>> to group leader? Calling exit(3) from the leader performs group exit
>>> IIRC.
>> 
>> pthread_exit from the group leader.
>> 
>>> I am not arguing this is non-issue. And it certainly is a problem once
>>> somebody wants to be nasty... I was more interested how often this
>>> really happens for sane workloads.
>> 
>> That is a fair question.  All I know for certain is that whatever Kirill
>> Tkhai's workload was it was triggering this the slow path.
>
> It was triggered on a server, where many VPS of many people are hosted.
> Sorry, I have no an idea what they did.

That at least tells us it was naturally occurring.  Which makes this a
real problem in the real world.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Eric W. Biederman
Kirill Tkhai  writes:

> On 01.06.2018 18:25, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>>> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
 Michal Hocko  writes:
>>> [...]
> Group leader exiting early without tearing down the whole thread
> group should be quite rare as well. No question that somebody might do
> that on purpose though...

 The group leader exiting early is a completely legitimate and reasonable
 thing to do, even if it is rare.
>>>
>>> I am not saying it isn't legitimate. But the most common case is the
>>> main thread waiting for its threads or calling exit which would tear the
>>> whole group down. Is there any easy way to achieve this other than tkill
>>> to group leader? Calling exit(3) from the leader performs group exit
>>> IIRC.
>> 
>> pthread_exit from the group leader.
>> 
>>> I am not arguing this is non-issue. And it certainly is a problem once
>>> somebody wants to be nasty... I was more interested how often this
>>> really happens for sane workloads.
>> 
>> That is a fair question.  All I know for certain is that whatever Kirill
>> Tkhai's workload was it was triggering this the slow path.
>
> It was triggered on a server, where many VPS of many people are hosted.
> Sorry, I have no an idea what they did.

That at least tells us it was naturally occurring.  Which makes this a
real problem in the real world.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Kirill Tkhai
On 01.06.2018 18:25, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
>> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
>>> Michal Hocko  writes:
>> [...]
 Group leader exiting early without tearing down the whole thread
 group should be quite rare as well. No question that somebody might do
 that on purpose though...
>>>
>>> The group leader exiting early is a completely legitimate and reasonable
>>> thing to do, even if it is rare.
>>
>> I am not saying it isn't legitimate. But the most common case is the
>> main thread waiting for its threads or calling exit which would tear the
>> whole group down. Is there any easy way to achieve this other than tkill
>> to group leader? Calling exit(3) from the leader performs group exit
>> IIRC.
> 
> pthread_exit from the group leader.
> 
>> I am not arguing this is non-issue. And it certainly is a problem once
>> somebody wants to be nasty... I was more interested how often this
>> really happens for sane workloads.
> 
> That is a fair question.  All I know for certain is that whatever Kirill
> Tkhai's workload was it was triggering this the slow path.

It was triggered on a server, where many VPS of many people are hosted.
Sorry, I have no an idea what they did.

Kirill


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Kirill Tkhai
On 01.06.2018 18:25, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
>> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
>>> Michal Hocko  writes:
>> [...]
 Group leader exiting early without tearing down the whole thread
 group should be quite rare as well. No question that somebody might do
 that on purpose though...
>>>
>>> The group leader exiting early is a completely legitimate and reasonable
>>> thing to do, even if it is rare.
>>
>> I am not saying it isn't legitimate. But the most common case is the
>> main thread waiting for its threads or calling exit which would tear the
>> whole group down. Is there any easy way to achieve this other than tkill
>> to group leader? Calling exit(3) from the leader performs group exit
>> IIRC.
> 
> pthread_exit from the group leader.
> 
>> I am not arguing this is non-issue. And it certainly is a problem once
>> somebody wants to be nasty... I was more interested how often this
>> really happens for sane workloads.
> 
> That is a fair question.  All I know for certain is that whatever Kirill
> Tkhai's workload was it was triggering this the slow path.

It was triggered on a server, where many VPS of many people are hosted.
Sorry, I have no an idea what they did.

Kirill


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Michal Hocko
On Mon 04-06-18 09:31:46, Eric W. Biederman wrote:
[...]
> My key point is that it is easy to trigger which makes the current
> mm_update_next_owner a fundamentally flawed design, and something that
> needs to be fixed.

Ohh, absolutely agreed! I was not trying to argue that part of course.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-05 Thread Michal Hocko
On Mon 04-06-18 09:31:46, Eric W. Biederman wrote:
[...]
> My key point is that it is easy to trigger which makes the current
> mm_update_next_owner a fundamentally flawed design, and something that
> needs to be fixed.

Ohh, absolutely agreed! I was not trying to argue that part of course.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-04 Thread Eric W. Biederman
Michal Hocko  writes:

> On Fri 01-06-18 10:25:59, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>> > On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
>> >> Michal Hocko  writes:
>> > [...]
>> >> > Group leader exiting early without tearing down the whole thread
>> >> > group should be quite rare as well. No question that somebody might do
>> >> > that on purpose though...
>> >> 
>> >> The group leader exiting early is a completely legitimate and reasonable
>> >> thing to do, even if it is rare.
>> >
>> > I am not saying it isn't legitimate. But the most common case is the
>> > main thread waiting for its threads or calling exit which would tear the
>> > whole group down. Is there any easy way to achieve this other than tkill
>> > to group leader? Calling exit(3) from the leader performs group exit
>> > IIRC.
>> 
>> pthread_exit from the group leader.
>
> Right, forgot to mention this one but this would be quite exotic,
> right?

Not exotic.  It is easy to do and well defined.

It would be easy to do if you are running a thread pool and closing
unnecessary threads.  Or frankly anything where the application is not
assigning a special role to the initial thread.

It does seem rare enough that no one has noticed how attrocious
mm_update_next_owner is until now.

My key point is that it is easy to trigger which makes the current
mm_update_next_owner a fundamentally flawed design, and something that
needs to be fixed.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-04 Thread Eric W. Biederman
Michal Hocko  writes:

> On Fri 01-06-18 10:25:59, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>> > On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
>> >> Michal Hocko  writes:
>> > [...]
>> >> > Group leader exiting early without tearing down the whole thread
>> >> > group should be quite rare as well. No question that somebody might do
>> >> > that on purpose though...
>> >> 
>> >> The group leader exiting early is a completely legitimate and reasonable
>> >> thing to do, even if it is rare.
>> >
>> > I am not saying it isn't legitimate. But the most common case is the
>> > main thread waiting for its threads or calling exit which would tear the
>> > whole group down. Is there any easy way to achieve this other than tkill
>> > to group leader? Calling exit(3) from the leader performs group exit
>> > IIRC.
>> 
>> pthread_exit from the group leader.
>
> Right, forgot to mention this one but this would be quite exotic,
> right?

Not exotic.  It is easy to do and well defined.

It would be easy to do if you are running a thread pool and closing
unnecessary threads.  Or frankly anything where the application is not
assigning a special role to the initial thread.

It does seem rare enough that no one has noticed how attrocious
mm_update_next_owner is until now.

My key point is that it is easy to trigger which makes the current
mm_update_next_owner a fundamentally flawed design, and something that
needs to be fixed.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-04 Thread Michal Hocko
On Fri 01-06-18 10:25:59, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
> > On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
> >> Michal Hocko  writes:
> > [...]
> >> > Group leader exiting early without tearing down the whole thread
> >> > group should be quite rare as well. No question that somebody might do
> >> > that on purpose though...
> >> 
> >> The group leader exiting early is a completely legitimate and reasonable
> >> thing to do, even if it is rare.
> >
> > I am not saying it isn't legitimate. But the most common case is the
> > main thread waiting for its threads or calling exit which would tear the
> > whole group down. Is there any easy way to achieve this other than tkill
> > to group leader? Calling exit(3) from the leader performs group exit
> > IIRC.
> 
> pthread_exit from the group leader.

Right, forgot to mention this one but this would be quite exotic, right?


> > I am not arguing this is non-issue. And it certainly is a problem once
> > somebody wants to be nasty... I was more interested how often this
> > really happens for sane workloads.
> 
> That is a fair question.  All I know for certain is that whatever Kirill
> Tkhai's workload was it was triggering this the slow path.

Yeah, that was exactly why I've asked that originally. It must be
something pretty special ;)
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-04 Thread Michal Hocko
On Fri 01-06-18 10:25:59, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
> > On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
> >> Michal Hocko  writes:
> > [...]
> >> > Group leader exiting early without tearing down the whole thread
> >> > group should be quite rare as well. No question that somebody might do
> >> > that on purpose though...
> >> 
> >> The group leader exiting early is a completely legitimate and reasonable
> >> thing to do, even if it is rare.
> >
> > I am not saying it isn't legitimate. But the most common case is the
> > main thread waiting for its threads or calling exit which would tear the
> > whole group down. Is there any easy way to achieve this other than tkill
> > to group leader? Calling exit(3) from the leader performs group exit
> > IIRC.
> 
> pthread_exit from the group leader.

Right, forgot to mention this one but this would be quite exotic, right?


> > I am not arguing this is non-issue. And it certainly is a problem once
> > somebody wants to be nasty... I was more interested how often this
> > really happens for sane workloads.
> 
> That is a fair question.  All I know for certain is that whatever Kirill
> Tkhai's workload was it was triggering this the slow path.

Yeah, that was exactly why I've asked that originally. It must be
something pretty special ;)
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Eric W. Biederman
Michal Hocko  writes:

> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
>> Michal Hocko  writes:
> [...]
>> > Group leader exiting early without tearing down the whole thread
>> > group should be quite rare as well. No question that somebody might do
>> > that on purpose though...
>> 
>> The group leader exiting early is a completely legitimate and reasonable
>> thing to do, even if it is rare.
>
> I am not saying it isn't legitimate. But the most common case is the
> main thread waiting for its threads or calling exit which would tear the
> whole group down. Is there any easy way to achieve this other than tkill
> to group leader? Calling exit(3) from the leader performs group exit
> IIRC.

pthread_exit from the group leader.

> I am not arguing this is non-issue. And it certainly is a problem once
> somebody wants to be nasty... I was more interested how often this
> really happens for sane workloads.

That is a fair question.  All I know for certain is that whatever Kirill
Tkhai's workload was it was triggering this the slow path.

Eric



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Eric W. Biederman
Michal Hocko  writes:

> On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
>> Michal Hocko  writes:
> [...]
>> > Group leader exiting early without tearing down the whole thread
>> > group should be quite rare as well. No question that somebody might do
>> > that on purpose though...
>> 
>> The group leader exiting early is a completely legitimate and reasonable
>> thing to do, even if it is rare.
>
> I am not saying it isn't legitimate. But the most common case is the
> main thread waiting for its threads or calling exit which would tear the
> whole group down. Is there any easy way to achieve this other than tkill
> to group leader? Calling exit(3) from the leader performs group exit
> IIRC.

pthread_exit from the group leader.

> I am not arguing this is non-issue. And it certainly is a problem once
> somebody wants to be nasty... I was more interested how often this
> really happens for sane workloads.

That is a fair question.  All I know for certain is that whatever Kirill
Tkhai's workload was it was triggering this the slow path.

Eric



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Michal Hocko
On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
> Michal Hocko  writes:
[...]
> > Group leader exiting early without tearing down the whole thread
> > group should be quite rare as well. No question that somebody might do
> > that on purpose though...
> 
> The group leader exiting early is a completely legitimate and reasonable
> thing to do, even if it is rare.

I am not saying it isn't legitimate. But the most common case is the
main thread waiting for its threads or calling exit which would tear the
whole group down. Is there any easy way to achieve this other than tkill
to group leader? Calling exit(3) from the leader performs group exit
IIRC.

I am not arguing this is non-issue. And it certainly is a problem once
somebody wants to be nasty... I was more interested how often this
really happens for sane workloads.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Michal Hocko
On Fri 01-06-18 09:32:42, Eric W. Biederman wrote:
> Michal Hocko  writes:
[...]
> > Group leader exiting early without tearing down the whole thread
> > group should be quite rare as well. No question that somebody might do
> > that on purpose though...
> 
> The group leader exiting early is a completely legitimate and reasonable
> thing to do, even if it is rare.

I am not saying it isn't legitimate. But the most common case is the
main thread waiting for its threads or calling exit which would tear the
whole group down. Is there any easy way to achieve this other than tkill
to group leader? Calling exit(3) from the leader performs group exit
IIRC.

I am not arguing this is non-issue. And it certainly is a problem once
somebody wants to be nasty... I was more interested how often this
really happens for sane workloads.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 31-05-18 20:07:28, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>> > On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> >> This function searches for a new mm owner in children and siblings,
>> >> and then iterates over all processes in the system in unlikely case.
>> >> Despite the case is unlikely, its probability growths with the number
>> >> of processes in the system. The time, spent on iterations, also growths.
>> >> I regulary observe mm_update_next_owner() in crash dumps (not related
>> >> to this function) of the nodes with many processes (20K+), so it looks
>> >> like it's not so unlikely case.
>> >
>> > Did you manage to find the pattern that forces mm_update_next_owner to
>> > slow paths? This really shouldn't trigger very often. If we can fallback
>> > easily then I suspect that we should be better off reconsidering
>> > mm->owner and try to come up with something more clever. I've had a
>> > patch to remove owner few years back. It needed some work to finish but
>> > maybe that would be a better than try to make non-scalable thing suck
>> > less.
>> 
>> Reading through the code I just found a trivial pattern that triggers
>> this.  Create a multi-threaded process.  Have the thread group leader
>> (the first thread) exit.
>
> Hmm, I thought that we try to iterate over threads in the same thread
> group first. But we are not doing that. Anyway just CLONE_VM without
> CLONE_THREAD would achieve the same pathological path but that should be
> rare.

Yes, if the child exited.   The code searches the children and siblings
but the parents of the process that exited.

> Group leader exiting early without tearing down the whole thread
> group should be quite rare as well. No question that somebody might do
> that on purpose though...

The group leader exiting early is a completely legitimate and reasonable
thing to do, even if it is rare.

I think all it would take is one program like that in a work-load for
the performance to descend into something unpleasant.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 31-05-18 20:07:28, Eric W. Biederman wrote:
>> Michal Hocko  writes:
>> 
>> > On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> >> This function searches for a new mm owner in children and siblings,
>> >> and then iterates over all processes in the system in unlikely case.
>> >> Despite the case is unlikely, its probability growths with the number
>> >> of processes in the system. The time, spent on iterations, also growths.
>> >> I regulary observe mm_update_next_owner() in crash dumps (not related
>> >> to this function) of the nodes with many processes (20K+), so it looks
>> >> like it's not so unlikely case.
>> >
>> > Did you manage to find the pattern that forces mm_update_next_owner to
>> > slow paths? This really shouldn't trigger very often. If we can fallback
>> > easily then I suspect that we should be better off reconsidering
>> > mm->owner and try to come up with something more clever. I've had a
>> > patch to remove owner few years back. It needed some work to finish but
>> > maybe that would be a better than try to make non-scalable thing suck
>> > less.
>> 
>> Reading through the code I just found a trivial pattern that triggers
>> this.  Create a multi-threaded process.  Have the thread group leader
>> (the first thread) exit.
>
> Hmm, I thought that we try to iterate over threads in the same thread
> group first. But we are not doing that. Anyway just CLONE_VM without
> CLONE_THREAD would achieve the same pathological path but that should be
> rare.

Yes, if the child exited.   The code searches the children and siblings
but the parents of the process that exited.

> Group leader exiting early without tearing down the whole thread
> group should be quite rare as well. No question that somebody might do
> that on purpose though...

The group leader exiting early is a completely legitimate and reasonable
thing to do, even if it is rare.

I think all it would take is one program like that in a work-load for
the performance to descend into something unpleasant.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Michal Hocko
On Thu 31-05-18 20:07:28, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
> > On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
> >> This function searches for a new mm owner in children and siblings,
> >> and then iterates over all processes in the system in unlikely case.
> >> Despite the case is unlikely, its probability growths with the number
> >> of processes in the system. The time, spent on iterations, also growths.
> >> I regulary observe mm_update_next_owner() in crash dumps (not related
> >> to this function) of the nodes with many processes (20K+), so it looks
> >> like it's not so unlikely case.
> >
> > Did you manage to find the pattern that forces mm_update_next_owner to
> > slow paths? This really shouldn't trigger very often. If we can fallback
> > easily then I suspect that we should be better off reconsidering
> > mm->owner and try to come up with something more clever. I've had a
> > patch to remove owner few years back. It needed some work to finish but
> > maybe that would be a better than try to make non-scalable thing suck
> > less.
> 
> Reading through the code I just found a trivial pattern that triggers
> this.  Create a multi-threaded process.  Have the thread group leader
> (the first thread) exit.

Hmm, I thought that we try to iterate over threads in the same thread
group first. But we are not doing that. Anyway just CLONE_VM without
CLONE_THREAD would achieve the same pathological path but that should be
rare. Group leader exiting early without tearing down the whole thread
group should be quite rare as well. No question that somebody might do
that on purpose though...

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-06-01 Thread Michal Hocko
On Thu 31-05-18 20:07:28, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
> > On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
> >> This function searches for a new mm owner in children and siblings,
> >> and then iterates over all processes in the system in unlikely case.
> >> Despite the case is unlikely, its probability growths with the number
> >> of processes in the system. The time, spent on iterations, also growths.
> >> I regulary observe mm_update_next_owner() in crash dumps (not related
> >> to this function) of the nodes with many processes (20K+), so it looks
> >> like it's not so unlikely case.
> >
> > Did you manage to find the pattern that forces mm_update_next_owner to
> > slow paths? This really shouldn't trigger very often. If we can fallback
> > easily then I suspect that we should be better off reconsidering
> > mm->owner and try to come up with something more clever. I've had a
> > patch to remove owner few years back. It needed some work to finish but
> > maybe that would be a better than try to make non-scalable thing suck
> > less.
> 
> Reading through the code I just found a trivial pattern that triggers
> this.  Create a multi-threaded process.  Have the thread group leader
> (the first thread) exit.

Hmm, I thought that we try to iterate over threads in the same thread
group first. But we are not doing that. Anyway just CLONE_VM without
CLONE_THREAD would achieve the same pathological path but that should be
rare. Group leader exiting early without tearing down the whole thread
group should be quite rare as well. No question that somebody might do
that on purpose though...

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-31 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> This function searches for a new mm owner in children and siblings,
>> and then iterates over all processes in the system in unlikely case.
>> Despite the case is unlikely, its probability growths with the number
>> of processes in the system. The time, spent on iterations, also growths.
>> I regulary observe mm_update_next_owner() in crash dumps (not related
>> to this function) of the nodes with many processes (20K+), so it looks
>> like it's not so unlikely case.
>
> Did you manage to find the pattern that forces mm_update_next_owner to
> slow paths? This really shouldn't trigger very often. If we can fallback
> easily then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back. It needed some work to finish but
> maybe that would be a better than try to make non-scalable thing suck
> less.

Reading through the code I just found a trivial pattern that triggers
this.  Create a multi-threaded process.  Have the thread group leader
(the first thread) exit.

This has the potential to be a significant DOS attack if anyone cares.

Eric



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-31 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> This function searches for a new mm owner in children and siblings,
>> and then iterates over all processes in the system in unlikely case.
>> Despite the case is unlikely, its probability growths with the number
>> of processes in the system. The time, spent on iterations, also growths.
>> I regulary observe mm_update_next_owner() in crash dumps (not related
>> to this function) of the nodes with many processes (20K+), so it looks
>> like it's not so unlikely case.
>
> Did you manage to find the pattern that forces mm_update_next_owner to
> slow paths? This really shouldn't trigger very often. If we can fallback
> easily then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back. It needed some work to finish but
> maybe that would be a better than try to make non-scalable thing suck
> less.

Reading through the code I just found a trivial pattern that triggers
this.  Create a multi-threaded process.  Have the thread group leader
(the first thread) exit.

This has the potential to be a significant DOS attack if anyone cares.

Eric



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-03 Thread Kirill Tkhai
On 27.04.2018 21:05, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
>> On Thu 26-04-18 21:28:18, Michal Hocko wrote:
>>> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
 Michal Hocko  writes:

> I've had a patch to remove owner few years back. It needed some work
> to finish but maybe that would be a better than try to make
> non-scalable thing suck less.

 I have a question.  Would it be reasonable to just have a mm->memcg?
 That would appear to be the simplest solution to the problem.
>>>
>>> I do not remember details. Have to re-read the whole thing again. Hope
>>> to get to this soon but with the current jet lag and backlog from the
>>> LSFMM I rather not promis anything. Going with mm->memcg would be the
>>> most simple of course but I have a very vague recollection that it was
>>> not possible. Maybe I misremember...
>>
>> Just for the record, the last version where I've tried to remove owner
>> was posted here: 
>> http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mho...@kernel.org
>>
>> I didn't get to remember details yet, but the primary problem was the
>> task migration between cgroups and the nasty case when different thread
>> grounds share the mm. At some point I just suggested to not care
>> about semantic of these weird threads all that much. We can either
>> migrate all tasks sharing the mm struct or just keep the inconsistency.
>>
>> Anyway, removing this ugliness would be so cool!
> 
> I suspect the only common user of CLONE_VM today is vfork.  And I do
> think it is crazy to migrate a process that has called vfork before
> calling exec.  Other useses of CLONE_VM seem even crazier.
> 
> I think the easiest change to make in mem_cgroup_can_attach would
> be just to change the test for when charges are migrated.  AKA
> 
> from:
> 
>   if (mm->owner == p) {
>   
> }
> 
> to
>   if (mem_cgroup_from_task(p) == mm->memcg) {
>   ...
> }
> 
> That allows using mm->memcg with no new limitations on when migration
> can be called.  In crazy cases that has the potential to change which
> memcgroup the charges are accounted to, but the choice is already
> somewhat arbitrary so I don't think that will be a problem.  Especially
> given that mm_update_next_owner does not migrate charges if the next
> owner is in a different memory cgroup.  A mm with tasks using it in
> two different cgroups is already questionable if not outright
> problematic.
> 
> 
> Kirill Tkhai do you think you would be able adapt Michal Hoko's old
> patch at https://marc.info/?l=linux-kernel=143635857131756=2
> that replaces mm->owner with mm->memcg?

I was at vacation. Sorry for the late reply.
 
> We probably want to outlaw migrating an mm where we are not migrating
> all of the mm->users eventually.  Just because that case is crazy.
> But it doesn't look like we need to do that to fix the memory control
> group data structures.

Kirill


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-03 Thread Kirill Tkhai
On 27.04.2018 21:05, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
>> On Thu 26-04-18 21:28:18, Michal Hocko wrote:
>>> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
 Michal Hocko  writes:

> I've had a patch to remove owner few years back. It needed some work
> to finish but maybe that would be a better than try to make
> non-scalable thing suck less.

 I have a question.  Would it be reasonable to just have a mm->memcg?
 That would appear to be the simplest solution to the problem.
>>>
>>> I do not remember details. Have to re-read the whole thing again. Hope
>>> to get to this soon but with the current jet lag and backlog from the
>>> LSFMM I rather not promis anything. Going with mm->memcg would be the
>>> most simple of course but I have a very vague recollection that it was
>>> not possible. Maybe I misremember...
>>
>> Just for the record, the last version where I've tried to remove owner
>> was posted here: 
>> http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mho...@kernel.org
>>
>> I didn't get to remember details yet, but the primary problem was the
>> task migration between cgroups and the nasty case when different thread
>> grounds share the mm. At some point I just suggested to not care
>> about semantic of these weird threads all that much. We can either
>> migrate all tasks sharing the mm struct or just keep the inconsistency.
>>
>> Anyway, removing this ugliness would be so cool!
> 
> I suspect the only common user of CLONE_VM today is vfork.  And I do
> think it is crazy to migrate a process that has called vfork before
> calling exec.  Other useses of CLONE_VM seem even crazier.
> 
> I think the easiest change to make in mem_cgroup_can_attach would
> be just to change the test for when charges are migrated.  AKA
> 
> from:
> 
>   if (mm->owner == p) {
>   
> }
> 
> to
>   if (mem_cgroup_from_task(p) == mm->memcg) {
>   ...
> }
> 
> That allows using mm->memcg with no new limitations on when migration
> can be called.  In crazy cases that has the potential to change which
> memcgroup the charges are accounted to, but the choice is already
> somewhat arbitrary so I don't think that will be a problem.  Especially
> given that mm_update_next_owner does not migrate charges if the next
> owner is in a different memory cgroup.  A mm with tasks using it in
> two different cgroups is already questionable if not outright
> problematic.
> 
> 
> Kirill Tkhai do you think you would be able adapt Michal Hoko's old
> patch at https://marc.info/?l=linux-kernel=143635857131756=2
> that replaces mm->owner with mm->memcg?

I was at vacation. Sorry for the late reply.
 
> We probably want to outlaw migrating an mm where we are not migrating
> all of the mm->users eventually.  Just because that case is crazy.
> But it doesn't look like we need to do that to fix the memory control
> group data structures.

Kirill


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-01 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Kirill Tkhai do you think you would be able adapt Michal Hoko's old
> patch at https://marc.info/?l=linux-kernel=143635857131756=2
> that replaces mm->owner with mm->memcg?

Ouch.  At least compared to the current kernel your patch is wide
of the mark of what is needed to use mm->memcg correctly.

I have however written my own and if someone can review it I would
really appreciate it.  It looks like we can change mm->owner to
mm->memcg without too much trouble and reduce the amount of code
in the kernel at the same time.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-01 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Kirill Tkhai do you think you would be able adapt Michal Hoko's old
> patch at https://marc.info/?l=linux-kernel=143635857131756=2
> that replaces mm->owner with mm->memcg?

Ouch.  At least compared to the current kernel your patch is wide
of the mark of what is needed to use mm->memcg correctly.

I have however written my own and if someone can review it I would
really appreciate it.  It looks like we can change mm->owner to
mm->memcg without too much trouble and reduce the amount of code
in the kernel at the same time.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-27 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 26-04-18 21:28:18, Michal Hocko wrote:
>> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
>> > Michal Hocko  writes:
>> > 
>> > > I've had a patch to remove owner few years back. It needed some work
>> > > to finish but maybe that would be a better than try to make
>> > > non-scalable thing suck less.
>> > 
>> > I have a question.  Would it be reasonable to just have a mm->memcg?
>> > That would appear to be the simplest solution to the problem.
>> 
>> I do not remember details. Have to re-read the whole thing again. Hope
>> to get to this soon but with the current jet lag and backlog from the
>> LSFMM I rather not promis anything. Going with mm->memcg would be the
>> most simple of course but I have a very vague recollection that it was
>> not possible. Maybe I misremember...
>
> Just for the record, the last version where I've tried to remove owner
> was posted here: 
> http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mho...@kernel.org
>
> I didn't get to remember details yet, but the primary problem was the
> task migration between cgroups and the nasty case when different thread
> grounds share the mm. At some point I just suggested to not care
> about semantic of these weird threads all that much. We can either
> migrate all tasks sharing the mm struct or just keep the inconsistency.
>
> Anyway, removing this ugliness would be so cool!

I suspect the only common user of CLONE_VM today is vfork.  And I do
think it is crazy to migrate a process that has called vfork before
calling exec.  Other useses of CLONE_VM seem even crazier.

I think the easiest change to make in mem_cgroup_can_attach would
be just to change the test for when charges are migrated.  AKA

from:

if (mm->owner == p) {

}

to
if (mem_cgroup_from_task(p) == mm->memcg) {
...
}

That allows using mm->memcg with no new limitations on when migration
can be called.  In crazy cases that has the potential to change which
memcgroup the charges are accounted to, but the choice is already
somewhat arbitrary so I don't think that will be a problem.  Especially
given that mm_update_next_owner does not migrate charges if the next
owner is in a different memory cgroup.  A mm with tasks using it in
two different cgroups is already questionable if not outright
problematic.


Kirill Tkhai do you think you would be able adapt Michal Hoko's old
patch at https://marc.info/?l=linux-kernel=143635857131756=2
that replaces mm->owner with mm->memcg?

We probably want to outlaw migrating an mm where we are not migrating
all of the mm->users eventually.  Just because that case is crazy.
But it doesn't look like we need to do that to fix the memory control
group data structures.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-27 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 26-04-18 21:28:18, Michal Hocko wrote:
>> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
>> > Michal Hocko  writes:
>> > 
>> > > I've had a patch to remove owner few years back. It needed some work
>> > > to finish but maybe that would be a better than try to make
>> > > non-scalable thing suck less.
>> > 
>> > I have a question.  Would it be reasonable to just have a mm->memcg?
>> > That would appear to be the simplest solution to the problem.
>> 
>> I do not remember details. Have to re-read the whole thing again. Hope
>> to get to this soon but with the current jet lag and backlog from the
>> LSFMM I rather not promis anything. Going with mm->memcg would be the
>> most simple of course but I have a very vague recollection that it was
>> not possible. Maybe I misremember...
>
> Just for the record, the last version where I've tried to remove owner
> was posted here: 
> http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mho...@kernel.org
>
> I didn't get to remember details yet, but the primary problem was the
> task migration between cgroups and the nasty case when different thread
> grounds share the mm. At some point I just suggested to not care
> about semantic of these weird threads all that much. We can either
> migrate all tasks sharing the mm struct or just keep the inconsistency.
>
> Anyway, removing this ugliness would be so cool!

I suspect the only common user of CLONE_VM today is vfork.  And I do
think it is crazy to migrate a process that has called vfork before
calling exec.  Other useses of CLONE_VM seem even crazier.

I think the easiest change to make in mem_cgroup_can_attach would
be just to change the test for when charges are migrated.  AKA

from:

if (mm->owner == p) {

}

to
if (mem_cgroup_from_task(p) == mm->memcg) {
...
}

That allows using mm->memcg with no new limitations on when migration
can be called.  In crazy cases that has the potential to change which
memcgroup the charges are accounted to, but the choice is already
somewhat arbitrary so I don't think that will be a problem.  Especially
given that mm_update_next_owner does not migrate charges if the next
owner is in a different memory cgroup.  A mm with tasks using it in
two different cgroups is already questionable if not outright
problematic.


Kirill Tkhai do you think you would be able adapt Michal Hoko's old
patch at https://marc.info/?l=linux-kernel=143635857131756=2
that replaces mm->owner with mm->memcg?

We probably want to outlaw migrating an mm where we are not migrating
all of the mm->users eventually.  Just because that case is crazy.
But it doesn't look like we need to do that to fix the memory control
group data structures.

Eric


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-27 Thread Michal Hocko
On Thu 26-04-18 21:28:18, Michal Hocko wrote:
> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
> > Michal Hocko  writes:
> > 
> > > I've had a patch to remove owner few years back. It needed some work
> > > to finish but maybe that would be a better than try to make
> > > non-scalable thing suck less.
> > 
> > I have a question.  Would it be reasonable to just have a mm->memcg?
> > That would appear to be the simplest solution to the problem.
> 
> I do not remember details. Have to re-read the whole thing again. Hope
> to get to this soon but with the current jet lag and backlog from the
> LSFMM I rather not promis anything. Going with mm->memcg would be the
> most simple of course but I have a very vague recollection that it was
> not possible. Maybe I misremember...

Just for the record, the last version where I've tried to remove owner
was posted here: 
http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mho...@kernel.org

I didn't get to remember details yet, but the primary problem was the
task migration between cgroups and the nasty case when different thread
grounds share the mm. At some point I just suggested to not care
about semantic of these weird threads all that much. We can either
migrate all tasks sharing the mm struct or just keep the inconsistency.

Anyway, removing this ugliness would be so cool!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-27 Thread Michal Hocko
On Thu 26-04-18 21:28:18, Michal Hocko wrote:
> On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
> > Michal Hocko  writes:
> > 
> > > I've had a patch to remove owner few years back. It needed some work
> > > to finish but maybe that would be a better than try to make
> > > non-scalable thing suck less.
> > 
> > I have a question.  Would it be reasonable to just have a mm->memcg?
> > That would appear to be the simplest solution to the problem.
> 
> I do not remember details. Have to re-read the whole thing again. Hope
> to get to this soon but with the current jet lag and backlog from the
> LSFMM I rather not promis anything. Going with mm->memcg would be the
> most simple of course but I have a very vague recollection that it was
> not possible. Maybe I misremember...

Just for the record, the last version where I've tried to remove owner
was posted here: 
http://lkml.kernel.org/r/1436358472-29137-1-git-send-email-mho...@kernel.org

I didn't get to remember details yet, but the primary problem was the
task migration between cgroups and the nasty case when different thread
grounds share the mm. At some point I just suggested to not care
about semantic of these weird threads all that much. We can either
migrate all tasks sharing the mm struct or just keep the inconsistency.

Anyway, removing this ugliness would be so cool!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Michal Hocko
On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
> > I've had a patch to remove owner few years back. It needed some work
> > to finish but maybe that would be a better than try to make
> > non-scalable thing suck less.
> 
> I have a question.  Would it be reasonable to just have a mm->memcg?
> That would appear to be the simplest solution to the problem.

I do not remember details. Have to re-read the whole thing again. Hope
to get to this soon but with the current jet lag and backlog from the
LSFMM I rather not promis anything. Going with mm->memcg would be the
most simple of course but I have a very vague recollection that it was
not possible. Maybe I misremember...

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Michal Hocko
On Thu 26-04-18 11:19:33, Eric W. Biederman wrote:
> Michal Hocko  writes:
> 
> > I've had a patch to remove owner few years back. It needed some work
> > to finish but maybe that would be a better than try to make
> > non-scalable thing suck less.
> 
> I have a question.  Would it be reasonable to just have a mm->memcg?
> That would appear to be the simplest solution to the problem.

I do not remember details. Have to re-read the whole thing again. Hope
to get to this soon but with the current jet lag and backlog from the
LSFMM I rather not promis anything. Going with mm->memcg would be the
most simple of course but I have a very vague recollection that it was
not possible. Maybe I misremember...

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Eric W. Biederman
Michal Hocko  writes:

> I've had a patch to remove owner few years back. It needed some work
> to finish but maybe that would be a better than try to make
> non-scalable thing suck less.

I have a question.  Would it be reasonable to just have a mm->memcg?
That would appear to be the simplest solution to the problem.

That would require failing migration between memory cgroups if you are
not moving all of processes/threads that have a given mm_struct.  That
should not be a huge restriction as typically it is only threads that
share a mm.  Further the check should be straigh forward: counting the
number of threads and verifying the count matches the count on the
mm_struct.

Eric








Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Eric W. Biederman
Michal Hocko  writes:

> I've had a patch to remove owner few years back. It needed some work
> to finish but maybe that would be a better than try to make
> non-scalable thing suck less.

I have a question.  Would it be reasonable to just have a mm->memcg?
That would appear to be the simplest solution to the problem.

That would require failing migration between memory cgroups if you are
not moving all of processes/threads that have a given mm_struct.  That
should not be a huge restriction as typically it is only threads that
share a mm.  Further the check should be straigh forward: counting the
number of threads and verifying the count matches the count on the
mm_struct.

Eric








Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Oleg Nesterov
On 04/26, Kirill Tkhai wrote:
>
> We can rework this simply by adding a list of tasks to mm.

Perhaps, but then I think this list should not depend on mm->owner.

I mean, mm->list_of_group_leaders_which_use_this_mm can be used by coredump
and oom-killer at least. But this is not that simple...

Oleg.



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Oleg Nesterov
On 04/26, Kirill Tkhai wrote:
>
> We can rework this simply by adding a list of tasks to mm.

Perhaps, but then I think this list should not depend on mm->owner.

I mean, mm->list_of_group_leaders_which_use_this_mm can be used by coredump
and oom-killer at least. But this is not that simple...

Oleg.



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Kirill Tkhai
On 26.04.2018 16:07, Michal Hocko wrote:
> On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> This function searches for a new mm owner in children and siblings,
>> and then iterates over all processes in the system in unlikely case.
>> Despite the case is unlikely, its probability growths with the number
>> of processes in the system. The time, spent on iterations, also growths.
>> I regulary observe mm_update_next_owner() in crash dumps (not related
>> to this function) of the nodes with many processes (20K+), so it looks
>> like it's not so unlikely case.
> 
> Did you manage to find the pattern that forces mm_update_next_owner to
> slow paths? This really shouldn't trigger very often. If we can fallback
> easily then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back. It needed some work to finish but
> maybe that would be a better than try to make non-scalable thing suck
> less.

It's not easy to find a pattern on such the big number of processes, especially
because of only the final result is visible in crash. I assume, this may
be connected with some unexpected signals received by task set in topology,
but I'm not sure. Though, even it becomes the most unlikely situation with
some small not zero probability, it still has to be optimized to minimize
unexpected situations.

We can rework this simply by adding a list of tasks to mm. Also, reuse
task_lock() of the mm owner for that. Something like:

assign_task_mm(struct mm_struct *mm, struct task_struct *task)
{
int again;
again:
again = 0;
rcu_read_lock();
task = mm->owner;
if (!task)
goto unlock;
task_lock(task);
if (mm->owner = task)
llist_add(>mm_list, >task_list);
else
again = 1;
task_unlock(task);
unlock:
rcu_read_unlock();
if (again)
goto again;
}

Kirill


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Kirill Tkhai
On 26.04.2018 16:07, Michal Hocko wrote:
> On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> This function searches for a new mm owner in children and siblings,
>> and then iterates over all processes in the system in unlikely case.
>> Despite the case is unlikely, its probability growths with the number
>> of processes in the system. The time, spent on iterations, also growths.
>> I regulary observe mm_update_next_owner() in crash dumps (not related
>> to this function) of the nodes with many processes (20K+), so it looks
>> like it's not so unlikely case.
> 
> Did you manage to find the pattern that forces mm_update_next_owner to
> slow paths? This really shouldn't trigger very often. If we can fallback
> easily then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back. It needed some work to finish but
> maybe that would be a better than try to make non-scalable thing suck
> less.

It's not easy to find a pattern on such the big number of processes, especially
because of only the final result is visible in crash. I assume, this may
be connected with some unexpected signals received by task set in topology,
but I'm not sure. Though, even it becomes the most unlikely situation with
some small not zero probability, it still has to be optimized to minimize
unexpected situations.

We can rework this simply by adding a list of tasks to mm. Also, reuse
task_lock() of the mm owner for that. Something like:

assign_task_mm(struct mm_struct *mm, struct task_struct *task)
{
int again;
again:
again = 0;
rcu_read_lock();
task = mm->owner;
if (!task)
goto unlock;
task_lock(task);
if (mm->owner = task)
llist_add(>mm_list, >task_list);
else
again = 1;
task_unlock(task);
unlock:
rcu_read_unlock();
if (again)
goto again;
}

Kirill


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Oleg Nesterov
On 04/26, Michal Hocko wrote:
>
> then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back.

Yeees, I do remember we discussed this but I forgot everything.

I agree, it would nice to turn mm->owner into mm->mem_cgroup and just kill
the ugly mm_update_next_owner/etc.

Oleg.



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Oleg Nesterov
On 04/26, Michal Hocko wrote:
>
> then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back.

Yeees, I do remember we discussed this but I forgot everything.

I agree, it would nice to turn mm->owner into mm->mem_cgroup and just kill
the ugly mm_update_next_owner/etc.

Oleg.



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Michal Hocko
On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
> This function searches for a new mm owner in children and siblings,
> and then iterates over all processes in the system in unlikely case.
> Despite the case is unlikely, its probability growths with the number
> of processes in the system. The time, spent on iterations, also growths.
> I regulary observe mm_update_next_owner() in crash dumps (not related
> to this function) of the nodes with many processes (20K+), so it looks
> like it's not so unlikely case.

Did you manage to find the pattern that forces mm_update_next_owner to
slow paths? This really shouldn't trigger very often. If we can fallback
easily then I suspect that we should be better off reconsidering
mm->owner and try to come up with something more clever. I've had a
patch to remove owner few years back. It needed some work to finish but
maybe that would be a better than try to make non-scalable thing suck
less.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-04-26 Thread Michal Hocko
On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
> This function searches for a new mm owner in children and siblings,
> and then iterates over all processes in the system in unlikely case.
> Despite the case is unlikely, its probability growths with the number
> of processes in the system. The time, spent on iterations, also growths.
> I regulary observe mm_update_next_owner() in crash dumps (not related
> to this function) of the nodes with many processes (20K+), so it looks
> like it's not so unlikely case.

Did you manage to find the pattern that forces mm_update_next_owner to
slow paths? This really shouldn't trigger very often. If we can fallback
easily then I suspect that we should be better off reconsidering
mm->owner and try to come up with something more clever. I've had a
patch to remove owner few years back. It needed some work to finish but
maybe that would be a better than try to make non-scalable thing suck
less.

-- 
Michal Hocko
SUSE Labs