Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-28 Thread Anshuman Khandual



On 05/21/2019 04:04 PM, Michal Hocko wrote:
> On Tue 21-05-19 08:25:55, Anshuman Khandual wrote:
>> On 05/20/2019 10:29 PM, Tim Murray wrote:
> [...]
>>> not seem to introduce a noticeable hot start penalty, not does it
>>> cause an increase in performance problems later in the app's
>>> lifecycle. I've measured with and without process_madvise, and the
>>> differences are within our noise bounds. Second, because we're not
>>
>> That is assuming that post process_madvise() working set for the application 
>> is
>> always smaller. There is another challenge. The external process should 
>> ideally
>> have the knowledge of active areas of the working set for an application in
>> question for it to invoke process_madvise() correctly to prevent such 
>> scenarios.
> 
> But that doesn't really seem relevant for the API itself, right? The
> higher level logic the monitor's business.

Right. I was just wondering how the monitor would even decide what areas of the
target application is active or inactive. The target application is still just 
an
opaque entity for the monitor unless there is some sort of communication. But 
you
are right, this not relevant to the API itself.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-27 Thread Minchan Kim
On Thu, May 23, 2019 at 10:07:17PM +0900, Minchan Kim wrote:
> On Wed, May 22, 2019 at 09:01:33AM -0700, Daniel Colascione wrote:
> > On Wed, May 22, 2019 at 9:01 AM Christian Brauner  
> > wrote:
> > >
> > > On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote:
> > > > On Wed, May 22, 2019 at 8:48 AM Christian Brauner 
> > > >  wrote:
> > > > >
> > > > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote:
> > > > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner 
> > > > > >  wrote:
> > > > > > > I'm not going to go into yet another long argument. I prefer 
> > > > > > > pidfd_*.
> > > > > >
> > > > > > Ok. We're each allowed our opinion.
> > > > > >
> > > > > > > It's tied to the api, transparent for userspace, and 
> > > > > > > disambiguates it
> > > > > > > from process_vm_{read,write}v that both take a pid_t.
> > > > > >
> > > > > > Speaking of process_vm_readv and process_vm_writev: both have a
> > > > > > currently-unused flags argument. Both should grow a flag that tells
> > > > > > them to interpret the pid argument as a pidfd. Or do you support
> > > > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
> > > > > > should process_madvise be called pidfd_madvise while 
> > > > > > process_vm_readv
> > > > > > isn't called pidfd_vm_readv?
> > > > >
> > > > > Actually, you should then do the same with process_madvise() and give 
> > > > > it
> > > > > a flag for that too if that's not too crazy.
> > > >
> > > > I don't know what you mean. My gut feeling is that for the sake of
> > > > consistency, process_madvise, process_vm_readv, and process_vm_writev
> > > > should all accept a first argument interpreted as either a numeric PID
> > > > or a pidfd depending on a flag --- ideally the same flag. Is that what
> > > > you have in mind?
> > >
> > > Yes. For the sake of consistency they should probably all default to
> > > interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag
> > > interpret as pidfd.
> > 
> > Sounds good to me!
> 
> Then, I want to change from pidfd to pid at next revsion and stick to
> process_madvise as naming. Later, you guys could define PROCESS_PIDFD
> flag and change all at once every process_xxx syscall friends.
> 
> If you are faster so that I see PROCESS_PIDFD earlier, I am happy to
> use it.

Hi Folks,

I don't want to consume a new API argument too early so want to say
I will use process_madvise with pidfs argument because I agree with
Daniel that we don't need to export implmentation on the syscall name.

I hope every upcoming new syscall with process has by default pidfs
so people are familiar with pidfd slowly so finallly they forgot pid
in the long run so naturally replace pid with pidfs.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-23 Thread Minchan Kim
On Wed, May 22, 2019 at 09:01:33AM -0700, Daniel Colascione wrote:
> On Wed, May 22, 2019 at 9:01 AM Christian Brauner  
> wrote:
> >
> > On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote:
> > > On Wed, May 22, 2019 at 8:48 AM Christian Brauner  
> > > wrote:
> > > >
> > > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote:
> > > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner 
> > > > >  wrote:
> > > > > > I'm not going to go into yet another long argument. I prefer 
> > > > > > pidfd_*.
> > > > >
> > > > > Ok. We're each allowed our opinion.
> > > > >
> > > > > > It's tied to the api, transparent for userspace, and disambiguates 
> > > > > > it
> > > > > > from process_vm_{read,write}v that both take a pid_t.
> > > > >
> > > > > Speaking of process_vm_readv and process_vm_writev: both have a
> > > > > currently-unused flags argument. Both should grow a flag that tells
> > > > > them to interpret the pid argument as a pidfd. Or do you support
> > > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
> > > > > should process_madvise be called pidfd_madvise while process_vm_readv
> > > > > isn't called pidfd_vm_readv?
> > > >
> > > > Actually, you should then do the same with process_madvise() and give it
> > > > a flag for that too if that's not too crazy.
> > >
> > > I don't know what you mean. My gut feeling is that for the sake of
> > > consistency, process_madvise, process_vm_readv, and process_vm_writev
> > > should all accept a first argument interpreted as either a numeric PID
> > > or a pidfd depending on a flag --- ideally the same flag. Is that what
> > > you have in mind?
> >
> > Yes. For the sake of consistency they should probably all default to
> > interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag
> > interpret as pidfd.
> 
> Sounds good to me!

Then, I want to change from pidfd to pid at next revsion and stick to
process_madvise as naming. Later, you guys could define PROCESS_PIDFD
flag and change all at once every process_xxx syscall friends.

If you are faster so that I see PROCESS_PIDFD earlier, I am happy to
use it.

Thanks.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Daniel Colascione
On Wed, May 22, 2019 at 9:01 AM Christian Brauner  wrote:
>
> On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote:
> > On Wed, May 22, 2019 at 8:48 AM Christian Brauner  
> > wrote:
> > >
> > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote:
> > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner 
> > > >  wrote:
> > > > > I'm not going to go into yet another long argument. I prefer pidfd_*.
> > > >
> > > > Ok. We're each allowed our opinion.
> > > >
> > > > > It's tied to the api, transparent for userspace, and disambiguates it
> > > > > from process_vm_{read,write}v that both take a pid_t.
> > > >
> > > > Speaking of process_vm_readv and process_vm_writev: both have a
> > > > currently-unused flags argument. Both should grow a flag that tells
> > > > them to interpret the pid argument as a pidfd. Or do you support
> > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
> > > > should process_madvise be called pidfd_madvise while process_vm_readv
> > > > isn't called pidfd_vm_readv?
> > >
> > > Actually, you should then do the same with process_madvise() and give it
> > > a flag for that too if that's not too crazy.
> >
> > I don't know what you mean. My gut feeling is that for the sake of
> > consistency, process_madvise, process_vm_readv, and process_vm_writev
> > should all accept a first argument interpreted as either a numeric PID
> > or a pidfd depending on a flag --- ideally the same flag. Is that what
> > you have in mind?
>
> Yes. For the sake of consistency they should probably all default to
> interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag
> interpret as pidfd.

Sounds good to me!


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Christian Brauner
On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote:
> On Wed, May 22, 2019 at 8:48 AM Christian Brauner  
> wrote:
> >
> > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote:
> > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner  
> > > wrote:
> > > > I'm not going to go into yet another long argument. I prefer pidfd_*.
> > >
> > > Ok. We're each allowed our opinion.
> > >
> > > > It's tied to the api, transparent for userspace, and disambiguates it
> > > > from process_vm_{read,write}v that both take a pid_t.
> > >
> > > Speaking of process_vm_readv and process_vm_writev: both have a
> > > currently-unused flags argument. Both should grow a flag that tells
> > > them to interpret the pid argument as a pidfd. Or do you support
> > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
> > > should process_madvise be called pidfd_madvise while process_vm_readv
> > > isn't called pidfd_vm_readv?
> >
> > Actually, you should then do the same with process_madvise() and give it
> > a flag for that too if that's not too crazy.
> 
> I don't know what you mean. My gut feeling is that for the sake of
> consistency, process_madvise, process_vm_readv, and process_vm_writev
> should all accept a first argument interpreted as either a numeric PID
> or a pidfd depending on a flag --- ideally the same flag. Is that what
> you have in mind?

Yes. For the sake of consistency they should probably all default to
interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag
interpret as pidfd.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Daniel Colascione
On Wed, May 22, 2019 at 8:48 AM Christian Brauner  wrote:
>
> On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote:
> > On Wed, May 22, 2019 at 7:52 AM Christian Brauner  
> > wrote:
> > > I'm not going to go into yet another long argument. I prefer pidfd_*.
> >
> > Ok. We're each allowed our opinion.
> >
> > > It's tied to the api, transparent for userspace, and disambiguates it
> > > from process_vm_{read,write}v that both take a pid_t.
> >
> > Speaking of process_vm_readv and process_vm_writev: both have a
> > currently-unused flags argument. Both should grow a flag that tells
> > them to interpret the pid argument as a pidfd. Or do you support
> > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
> > should process_madvise be called pidfd_madvise while process_vm_readv
> > isn't called pidfd_vm_readv?
>
> Actually, you should then do the same with process_madvise() and give it
> a flag for that too if that's not too crazy.

I don't know what you mean. My gut feeling is that for the sake of
consistency, process_madvise, process_vm_readv, and process_vm_writev
should all accept a first argument interpreted as either a numeric PID
or a pidfd depending on a flag --- ideally the same flag. Is that what
you have in mind?


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Christian Brauner
On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote:
> On Wed, May 22, 2019 at 7:52 AM Christian Brauner  
> wrote:
> > I'm not going to go into yet another long argument. I prefer pidfd_*.
> 
> Ok. We're each allowed our opinion.
> 
> > It's tied to the api, transparent for userspace, and disambiguates it
> > from process_vm_{read,write}v that both take a pid_t.
> 
> Speaking of process_vm_readv and process_vm_writev: both have a
> currently-unused flags argument. Both should grow a flag that tells
> them to interpret the pid argument as a pidfd. Or do you support
> adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
> should process_madvise be called pidfd_madvise while process_vm_readv
> isn't called pidfd_vm_readv?

Actually, you should then do the same with process_madvise() and give it
a flag for that too if that's not too crazy.

Christian


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Daniel Colascione
On Wed, May 22, 2019 at 7:52 AM Christian Brauner  wrote:
> I'm not going to go into yet another long argument. I prefer pidfd_*.

Ok. We're each allowed our opinion.

> It's tied to the api, transparent for userspace, and disambiguates it
> from process_vm_{read,write}v that both take a pid_t.

Speaking of process_vm_readv and process_vm_writev: both have a
currently-unused flags argument. Both should grow a flag that tells
them to interpret the pid argument as a pidfd. Or do you support
adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why
should process_madvise be called pidfd_madvise while process_vm_readv
isn't called pidfd_vm_readv?


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Christian Brauner
On Wed, May 22, 2019 at 06:16:35AM -0700, Daniel Colascione wrote:
> On Wed, May 22, 2019 at 1:22 AM Christian Brauner  
> wrote:
> >
> > On Wed, May 22, 2019 at 7:12 AM Daniel Colascione  wrote:
> > >
> > > On Tue, May 21, 2019 at 4:39 AM Christian Brauner  
> > > wrote:
> > > >
> > > > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote:
> > > > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> > > > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > > > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > > > > > > - Background
> > > > > > > >
> > > > > > > > The Android terminology used for forking a new process and 
> > > > > > > > starting an app
> > > > > > > > from scratch is a cold start, while resuming an existing app is 
> > > > > > > > a hot start.
> > > > > > > > While we continually try to improve the performance of cold 
> > > > > > > > starts, hot
> > > > > > > > starts will always be significantly less power hungry as well 
> > > > > > > > as faster so
> > > > > > > > we are trying to make hot start more likely than cold start.
> > > > > > > >
> > > > > > > > To increase hot start, Android userspace manages the order that 
> > > > > > > > apps should
> > > > > > > > be killed in a process called ActivityManagerService. 
> > > > > > > > ActivityManagerService
> > > > > > > > tracks every Android app or service that the user could be 
> > > > > > > > interacting with
> > > > > > > > at any time and translates that into a ranked list for lmkd(low 
> > > > > > > > memory
> > > > > > > > killer daemon). They are likely to be killed by lmkd if the 
> > > > > > > > system has to
> > > > > > > > reclaim memory. In that sense they are similar to entries in 
> > > > > > > > any other cache.
> > > > > > > > Those apps are kept alive for opportunistic performance 
> > > > > > > > improvements but
> > > > > > > > those performance improvements will vary based on the memory 
> > > > > > > > requirements of
> > > > > > > > individual workloads.
> > > > > > > >
> > > > > > > > - Problem
> > > > > > > >
> > > > > > > > Naturally, cached apps were dominant consumers of memory on the 
> > > > > > > > system.
> > > > > > > > However, they were not significant consumers of swap even 
> > > > > > > > though they are
> > > > > > > > good candidate for swap. Under investigation, swapping out only 
> > > > > > > > begins
> > > > > > > > once the low zone watermark is hit and kswapd wakes up, but the 
> > > > > > > > overall
> > > > > > > > allocation rate in the system might trip lmkd thresholds and 
> > > > > > > > cause a cached
> > > > > > > > process to be killed(we measured performance swapping out vs. 
> > > > > > > > zapping the
> > > > > > > > memory by killing a process. Unsurprisingly, zapping is 10x 
> > > > > > > > times faster
> > > > > > > > even though we use zram which is much faster than real storage) 
> > > > > > > > so kill
> > > > > > > > from lmkd will often satisfy the high zone watermark, resulting 
> > > > > > > > in very
> > > > > > > > few pages actually being moved to swap.
> > > > > > > >
> > > > > > > > - Approach
> > > > > > > >
> > > > > > > > The approach we chose was to use a new interface to allow 
> > > > > > > > userspace to
> > > > > > > > proactively reclaim entire processes by leveraging platform 
> > > > > > > > information.
> > > > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs 
> > > > > > > > for pages
> > > > > > > > that are known to be cold from userspace and to avoid races 
> > > > > > > > with lmkd
> > > > > > > > by reclaiming apps as soon as they entered the cached state. 
> > > > > > > > Additionally,
> > > > > > > > it could provide many chances for platform to use much 
> > > > > > > > information to
> > > > > > > > optimize memory efficiency.
> > > > > > > >
> > > > > > > > IMHO we should spell it out that this patchset complements 
> > > > > > > > MADV_WONTNEED
> > > > > > > > and MADV_FREE by adding non-destructive ways to gain some free 
> > > > > > > > memory
> > > > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it 
> > > > > > > > hints the
> > > > > > > > kernel that memory region is not currently needed and should be 
> > > > > > > > reclaimed
> > > > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it 
> > > > > > > > hints the
> > > > > > > > kernel that memory region is not currently needed and should be 
> > > > > > > > reclaimed
> > > > > > > > when memory pressure rises.
> > > > > > > >
> > > > > > > > To achieve the goal, the patchset introduce two new options for 
> > > > > > > > madvise.
> > > > > > > > One is MADV_COOL which will deactive activated pages and the 
> > > > > > > > other is
> > > > > > > > MADV_COLD which will reclaim private pages instantly. These new 
> > > > > > > > options
> > > > > > > > complement MADV_DONTNEED and MADV_FREE by adding 
> > > > > > > > non-destructive ways to
> > > > > > > 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Daniel Colascione
On Wed, May 22, 2019 at 1:22 AM Christian Brauner  wrote:
>
> On Wed, May 22, 2019 at 7:12 AM Daniel Colascione  wrote:
> >
> > On Tue, May 21, 2019 at 4:39 AM Christian Brauner  
> > wrote:
> > >
> > > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote:
> > > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> > > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > > > > > - Background
> > > > > > >
> > > > > > > The Android terminology used for forking a new process and 
> > > > > > > starting an app
> > > > > > > from scratch is a cold start, while resuming an existing app is a 
> > > > > > > hot start.
> > > > > > > While we continually try to improve the performance of cold 
> > > > > > > starts, hot
> > > > > > > starts will always be significantly less power hungry as well as 
> > > > > > > faster so
> > > > > > > we are trying to make hot start more likely than cold start.
> > > > > > >
> > > > > > > To increase hot start, Android userspace manages the order that 
> > > > > > > apps should
> > > > > > > be killed in a process called ActivityManagerService. 
> > > > > > > ActivityManagerService
> > > > > > > tracks every Android app or service that the user could be 
> > > > > > > interacting with
> > > > > > > at any time and translates that into a ranked list for lmkd(low 
> > > > > > > memory
> > > > > > > killer daemon). They are likely to be killed by lmkd if the 
> > > > > > > system has to
> > > > > > > reclaim memory. In that sense they are similar to entries in any 
> > > > > > > other cache.
> > > > > > > Those apps are kept alive for opportunistic performance 
> > > > > > > improvements but
> > > > > > > those performance improvements will vary based on the memory 
> > > > > > > requirements of
> > > > > > > individual workloads.
> > > > > > >
> > > > > > > - Problem
> > > > > > >
> > > > > > > Naturally, cached apps were dominant consumers of memory on the 
> > > > > > > system.
> > > > > > > However, they were not significant consumers of swap even though 
> > > > > > > they are
> > > > > > > good candidate for swap. Under investigation, swapping out only 
> > > > > > > begins
> > > > > > > once the low zone watermark is hit and kswapd wakes up, but the 
> > > > > > > overall
> > > > > > > allocation rate in the system might trip lmkd thresholds and 
> > > > > > > cause a cached
> > > > > > > process to be killed(we measured performance swapping out vs. 
> > > > > > > zapping the
> > > > > > > memory by killing a process. Unsurprisingly, zapping is 10x times 
> > > > > > > faster
> > > > > > > even though we use zram which is much faster than real storage) 
> > > > > > > so kill
> > > > > > > from lmkd will often satisfy the high zone watermark, resulting 
> > > > > > > in very
> > > > > > > few pages actually being moved to swap.
> > > > > > >
> > > > > > > - Approach
> > > > > > >
> > > > > > > The approach we chose was to use a new interface to allow 
> > > > > > > userspace to
> > > > > > > proactively reclaim entire processes by leveraging platform 
> > > > > > > information.
> > > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for 
> > > > > > > pages
> > > > > > > that are known to be cold from userspace and to avoid races with 
> > > > > > > lmkd
> > > > > > > by reclaiming apps as soon as they entered the cached state. 
> > > > > > > Additionally,
> > > > > > > it could provide many chances for platform to use much 
> > > > > > > information to
> > > > > > > optimize memory efficiency.
> > > > > > >
> > > > > > > IMHO we should spell it out that this patchset complements 
> > > > > > > MADV_WONTNEED
> > > > > > > and MADV_FREE by adding non-destructive ways to gain some free 
> > > > > > > memory
> > > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it 
> > > > > > > hints the
> > > > > > > kernel that memory region is not currently needed and should be 
> > > > > > > reclaimed
> > > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it 
> > > > > > > hints the
> > > > > > > kernel that memory region is not currently needed and should be 
> > > > > > > reclaimed
> > > > > > > when memory pressure rises.
> > > > > > >
> > > > > > > To achieve the goal, the patchset introduce two new options for 
> > > > > > > madvise.
> > > > > > > One is MADV_COOL which will deactive activated pages and the 
> > > > > > > other is
> > > > > > > MADV_COLD which will reclaim private pages instantly. These new 
> > > > > > > options
> > > > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive 
> > > > > > > ways to
> > > > > > > gain some free memory space. MADV_COLD is similar to 
> > > > > > > MADV_DONTNEED in a way
> > > > > > > that it hints the kernel that memory region is not currently 
> > > > > > > needed and
> > > > > > > should be reclaimed immediately; MADV_COOL is similar to 
> > > > 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-22 Thread Christian Brauner
On Wed, May 22, 2019 at 7:12 AM Daniel Colascione  wrote:
>
> On Tue, May 21, 2019 at 4:39 AM Christian Brauner  
> wrote:
> >
> > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote:
> > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > > > > - Background
> > > > > >
> > > > > > The Android terminology used for forking a new process and starting 
> > > > > > an app
> > > > > > from scratch is a cold start, while resuming an existing app is a 
> > > > > > hot start.
> > > > > > While we continually try to improve the performance of cold starts, 
> > > > > > hot
> > > > > > starts will always be significantly less power hungry as well as 
> > > > > > faster so
> > > > > > we are trying to make hot start more likely than cold start.
> > > > > >
> > > > > > To increase hot start, Android userspace manages the order that 
> > > > > > apps should
> > > > > > be killed in a process called ActivityManagerService. 
> > > > > > ActivityManagerService
> > > > > > tracks every Android app or service that the user could be 
> > > > > > interacting with
> > > > > > at any time and translates that into a ranked list for lmkd(low 
> > > > > > memory
> > > > > > killer daemon). They are likely to be killed by lmkd if the system 
> > > > > > has to
> > > > > > reclaim memory. In that sense they are similar to entries in any 
> > > > > > other cache.
> > > > > > Those apps are kept alive for opportunistic performance 
> > > > > > improvements but
> > > > > > those performance improvements will vary based on the memory 
> > > > > > requirements of
> > > > > > individual workloads.
> > > > > >
> > > > > > - Problem
> > > > > >
> > > > > > Naturally, cached apps were dominant consumers of memory on the 
> > > > > > system.
> > > > > > However, they were not significant consumers of swap even though 
> > > > > > they are
> > > > > > good candidate for swap. Under investigation, swapping out only 
> > > > > > begins
> > > > > > once the low zone watermark is hit and kswapd wakes up, but the 
> > > > > > overall
> > > > > > allocation rate in the system might trip lmkd thresholds and cause 
> > > > > > a cached
> > > > > > process to be killed(we measured performance swapping out vs. 
> > > > > > zapping the
> > > > > > memory by killing a process. Unsurprisingly, zapping is 10x times 
> > > > > > faster
> > > > > > even though we use zram which is much faster than real storage) so 
> > > > > > kill
> > > > > > from lmkd will often satisfy the high zone watermark, resulting in 
> > > > > > very
> > > > > > few pages actually being moved to swap.
> > > > > >
> > > > > > - Approach
> > > > > >
> > > > > > The approach we chose was to use a new interface to allow userspace 
> > > > > > to
> > > > > > proactively reclaim entire processes by leveraging platform 
> > > > > > information.
> > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for 
> > > > > > pages
> > > > > > that are known to be cold from userspace and to avoid races with 
> > > > > > lmkd
> > > > > > by reclaiming apps as soon as they entered the cached state. 
> > > > > > Additionally,
> > > > > > it could provide many chances for platform to use much information 
> > > > > > to
> > > > > > optimize memory efficiency.
> > > > > >
> > > > > > IMHO we should spell it out that this patchset complements 
> > > > > > MADV_WONTNEED
> > > > > > and MADV_FREE by adding non-destructive ways to gain some free 
> > > > > > memory
> > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints 
> > > > > > the
> > > > > > kernel that memory region is not currently needed and should be 
> > > > > > reclaimed
> > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it 
> > > > > > hints the
> > > > > > kernel that memory region is not currently needed and should be 
> > > > > > reclaimed
> > > > > > when memory pressure rises.
> > > > > >
> > > > > > To achieve the goal, the patchset introduce two new options for 
> > > > > > madvise.
> > > > > > One is MADV_COOL which will deactive activated pages and the other 
> > > > > > is
> > > > > > MADV_COLD which will reclaim private pages instantly. These new 
> > > > > > options
> > > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive 
> > > > > > ways to
> > > > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED 
> > > > > > in a way
> > > > > > that it hints the kernel that memory region is not currently needed 
> > > > > > and
> > > > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE 
> > > > > > in a way
> > > > > > that it hints the kernel that memory region is not currently needed 
> > > > > > and
> > > > > > should be reclaimed when memory pressure rises.
> > > > > >
> > > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Daniel Colascione
On Tue, May 21, 2019 at 4:39 AM Christian Brauner  wrote:
>
> On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote:
> > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > > > - Background
> > > > >
> > > > > The Android terminology used for forking a new process and starting 
> > > > > an app
> > > > > from scratch is a cold start, while resuming an existing app is a hot 
> > > > > start.
> > > > > While we continually try to improve the performance of cold starts, 
> > > > > hot
> > > > > starts will always be significantly less power hungry as well as 
> > > > > faster so
> > > > > we are trying to make hot start more likely than cold start.
> > > > >
> > > > > To increase hot start, Android userspace manages the order that apps 
> > > > > should
> > > > > be killed in a process called ActivityManagerService. 
> > > > > ActivityManagerService
> > > > > tracks every Android app or service that the user could be 
> > > > > interacting with
> > > > > at any time and translates that into a ranked list for lmkd(low memory
> > > > > killer daemon). They are likely to be killed by lmkd if the system 
> > > > > has to
> > > > > reclaim memory. In that sense they are similar to entries in any 
> > > > > other cache.
> > > > > Those apps are kept alive for opportunistic performance improvements 
> > > > > but
> > > > > those performance improvements will vary based on the memory 
> > > > > requirements of
> > > > > individual workloads.
> > > > >
> > > > > - Problem
> > > > >
> > > > > Naturally, cached apps were dominant consumers of memory on the 
> > > > > system.
> > > > > However, they were not significant consumers of swap even though they 
> > > > > are
> > > > > good candidate for swap. Under investigation, swapping out only begins
> > > > > once the low zone watermark is hit and kswapd wakes up, but the 
> > > > > overall
> > > > > allocation rate in the system might trip lmkd thresholds and cause a 
> > > > > cached
> > > > > process to be killed(we measured performance swapping out vs. zapping 
> > > > > the
> > > > > memory by killing a process. Unsurprisingly, zapping is 10x times 
> > > > > faster
> > > > > even though we use zram which is much faster than real storage) so 
> > > > > kill
> > > > > from lmkd will often satisfy the high zone watermark, resulting in 
> > > > > very
> > > > > few pages actually being moved to swap.
> > > > >
> > > > > - Approach
> > > > >
> > > > > The approach we chose was to use a new interface to allow userspace to
> > > > > proactively reclaim entire processes by leveraging platform 
> > > > > information.
> > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for 
> > > > > pages
> > > > > that are known to be cold from userspace and to avoid races with lmkd
> > > > > by reclaiming apps as soon as they entered the cached state. 
> > > > > Additionally,
> > > > > it could provide many chances for platform to use much information to
> > > > > optimize memory efficiency.
> > > > >
> > > > > IMHO we should spell it out that this patchset complements 
> > > > > MADV_WONTNEED
> > > > > and MADV_FREE by adding non-destructive ways to gain some free memory
> > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints 
> > > > > the
> > > > > kernel that memory region is not currently needed and should be 
> > > > > reclaimed
> > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints 
> > > > > the
> > > > > kernel that memory region is not currently needed and should be 
> > > > > reclaimed
> > > > > when memory pressure rises.
> > > > >
> > > > > To achieve the goal, the patchset introduce two new options for 
> > > > > madvise.
> > > > > One is MADV_COOL which will deactive activated pages and the other is
> > > > > MADV_COLD which will reclaim private pages instantly. These new 
> > > > > options
> > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways 
> > > > > to
> > > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in 
> > > > > a way
> > > > > that it hints the kernel that memory region is not currently needed 
> > > > > and
> > > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in 
> > > > > a way
> > > > > that it hints the kernel that memory region is not currently needed 
> > > > > and
> > > > > should be reclaimed when memory pressure rises.
> > > > >
> > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > > > > information required to make the reclaim decision is not known to the 
> > > > > app.
> > > > > Instead, it is known to a centralized userspace daemon, and that 
> > > > > daemon
> > > > > must be able to initiate reclaim on its own without any app 
> > > > > involvement.
> > > > > To solve the concern, this patch introduces new syscall 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Brian Geffon
To expand on the ChromeOS use case we're in a very similar situation
to Android. For example, the Chrome browser uses a separate process
for each individual tab (with some exceptions) and over time many tabs
remain open in a back-grounded or idle state. Given that we have a lot
of information about the weight of a tab, when it was last active,
etc, we can benefit tremendously from per-process reclaim. We're
working on getting real world numbers but all of our initial testing
shows very promising results.


On Tue, May 21, 2019 at 5:57 AM Shakeel Butt  wrote:
>
> On Mon, May 20, 2019 at 7:55 PM Anshuman Khandual
>  wrote:
> >
> >
> >
> > On 05/20/2019 10:29 PM, Tim Murray wrote:
> > > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
> > >  wrote:
> > >>
> > >> Or Is the objective here is reduce the number of processes which get 
> > >> killed by
> > >> lmkd by triggering swapping for the unused memory (user hinted) sooner 
> > >> so that
> > >> they dont get picked by lmkd. Under utilization for zram hardware is a 
> > >> concern
> > >> here as well ?
> > >
> > > The objective is to avoid some instances of memory pressure by
> > > proactively swapping pages that userspace knows to be cold before
> > > those pages reach the end of the LRUs, which in turn can prevent some
> > > apps from being killed by lmk/lmkd. As soon as Android userspace knows
> > > that an application is not being used and is only resident to improve
> > > performance if the user returns to that app, we can kick off
> > > process_madvise on that process's pages (or some portion of those
> > > pages) in a power-efficient way to reduce memory pressure long before
> > > the system hits the free page watermark. This allows the system more
> > > time to put pages into zram versus waiting for the watermark to
> > > trigger kswapd, which decreases the likelihood that later memory
> > > allocations will cause enough pressure to trigger a kill of one of
> > > these apps.
> >
> > So this opens up bit of LRU management to user space hints. Also because 
> > the app
> > in itself wont know about the memory situation of the entire system, new 
> > system
> > call needs to be called from an external process.
> >
> > >
> > >> Swapping out memory into zram wont increase the latency for a hot start 
> > >> ? Or
> > >> is it because as it will prevent a fresh cold start which anyway will be 
> > >> slower
> > >> than a slow hot start. Just being curious.
> > >
> > > First, not all swapped pages will be reloaded immediately once an app
> > > is resumed. We've found that an app's working set post-process_madvise
> > > is significantly smaller than what an app allocates when it first
> > > launches (see the delta between pswpin and pswpout in Minchan's
> > > results). Presumably because of this, faulting to fetch from zram does
> >
> > pswpin  4176131392647 975034 233.00
> > pswpout127422426617311387507 108.00
> >
> > IIUC the swap-in ratio is way higher in comparison to that of swap out. Is 
> > that
> > always the case ? Or it tend to swap out from an active area of the working 
> > set
> > which faulted back again.
> >
> > > not seem to introduce a noticeable hot start penalty, not does it
> > > cause an increase in performance problems later in the app's
> > > lifecycle. I've measured with and without process_madvise, and the
> > > differences are within our noise bounds. Second, because we're not
> >
> > That is assuming that post process_madvise() working set for the 
> > application is
> > always smaller. There is another challenge. The external process should 
> > ideally
> > have the knowledge of active areas of the working set for an application in
> > question for it to invoke process_madvise() correctly to prevent such 
> > scenarios.
> >
> > > preemptively evicting file pages and only making them more likely to
> > > be evicted when there's already memory pressure, we avoid the case
> > > where we process_madvise an app then immediately return to the app and
> > > reload all file pages in the working set even though there was no
> > > intervening memory pressure. Our initial version of this work evicted
> >
> > That would be the worst case scenario which should be avoided. Memory 
> > pressure
> > must be a parameter before actually doing the swap out. But pages if know 
> > to be
> > inactive/cold can be marked high priority to be swapped out.
> >
> > > file pages preemptively and did cause a noticeable slowdown (~15%) for
> > > that case; this patch set avoids that slowdown. Finally, the benefit
> > > from avoiding cold starts is huge. The performance improvement from
> > > having a hot start instead of a cold start ranges from 3x for very
> > > small apps to 50x+ for larger apps like high-fidelity games.
> >
> > Is there any other real world scenario apart from this app based ecosystem 
> > where
> > user hinted LRU management might be helpful ? Just being curious. Thanks 
> > for the
> > detailed 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Shakeel Butt
On Mon, May 20, 2019 at 7:55 PM Anshuman Khandual
 wrote:
>
>
>
> On 05/20/2019 10:29 PM, Tim Murray wrote:
> > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
> >  wrote:
> >>
> >> Or Is the objective here is reduce the number of processes which get 
> >> killed by
> >> lmkd by triggering swapping for the unused memory (user hinted) sooner so 
> >> that
> >> they dont get picked by lmkd. Under utilization for zram hardware is a 
> >> concern
> >> here as well ?
> >
> > The objective is to avoid some instances of memory pressure by
> > proactively swapping pages that userspace knows to be cold before
> > those pages reach the end of the LRUs, which in turn can prevent some
> > apps from being killed by lmk/lmkd. As soon as Android userspace knows
> > that an application is not being used and is only resident to improve
> > performance if the user returns to that app, we can kick off
> > process_madvise on that process's pages (or some portion of those
> > pages) in a power-efficient way to reduce memory pressure long before
> > the system hits the free page watermark. This allows the system more
> > time to put pages into zram versus waiting for the watermark to
> > trigger kswapd, which decreases the likelihood that later memory
> > allocations will cause enough pressure to trigger a kill of one of
> > these apps.
>
> So this opens up bit of LRU management to user space hints. Also because the 
> app
> in itself wont know about the memory situation of the entire system, new 
> system
> call needs to be called from an external process.
>
> >
> >> Swapping out memory into zram wont increase the latency for a hot start ? 
> >> Or
> >> is it because as it will prevent a fresh cold start which anyway will be 
> >> slower
> >> than a slow hot start. Just being curious.
> >
> > First, not all swapped pages will be reloaded immediately once an app
> > is resumed. We've found that an app's working set post-process_madvise
> > is significantly smaller than what an app allocates when it first
> > launches (see the delta between pswpin and pswpout in Minchan's
> > results). Presumably because of this, faulting to fetch from zram does
>
> pswpin  4176131392647 975034 233.00
> pswpout127422426617311387507 108.00
>
> IIUC the swap-in ratio is way higher in comparison to that of swap out. Is 
> that
> always the case ? Or it tend to swap out from an active area of the working 
> set
> which faulted back again.
>
> > not seem to introduce a noticeable hot start penalty, not does it
> > cause an increase in performance problems later in the app's
> > lifecycle. I've measured with and without process_madvise, and the
> > differences are within our noise bounds. Second, because we're not
>
> That is assuming that post process_madvise() working set for the application 
> is
> always smaller. There is another challenge. The external process should 
> ideally
> have the knowledge of active areas of the working set for an application in
> question for it to invoke process_madvise() correctly to prevent such 
> scenarios.
>
> > preemptively evicting file pages and only making them more likely to
> > be evicted when there's already memory pressure, we avoid the case
> > where we process_madvise an app then immediately return to the app and
> > reload all file pages in the working set even though there was no
> > intervening memory pressure. Our initial version of this work evicted
>
> That would be the worst case scenario which should be avoided. Memory pressure
> must be a parameter before actually doing the swap out. But pages if know to 
> be
> inactive/cold can be marked high priority to be swapped out.
>
> > file pages preemptively and did cause a noticeable slowdown (~15%) for
> > that case; this patch set avoids that slowdown. Finally, the benefit
> > from avoiding cold starts is huge. The performance improvement from
> > having a hot start instead of a cold start ranges from 3x for very
> > small apps to 50x+ for larger apps like high-fidelity games.
>
> Is there any other real world scenario apart from this app based ecosystem 
> where
> user hinted LRU management might be helpful ? Just being curious. Thanks for 
> the
> detailed explanation. I will continue looking into this series.

Chrome OS is another real world use-case for this user hinted LRU
management approach by proactively reclaiming reclaim from tabs not
accessed by the user for some time.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Shakeel Butt
On Sun, May 19, 2019 at 8:53 PM Minchan Kim  wrote:
>
> - Background
>
> The Android terminology used for forking a new process and starting an app
> from scratch is a cold start, while resuming an existing app is a hot start.
> While we continually try to improve the performance of cold starts, hot
> starts will always be significantly less power hungry as well as faster so
> we are trying to make hot start more likely than cold start.
>
> To increase hot start, Android userspace manages the order that apps should
> be killed in a process called ActivityManagerService. ActivityManagerService
> tracks every Android app or service that the user could be interacting with
> at any time and translates that into a ranked list for lmkd(low memory
> killer daemon). They are likely to be killed by lmkd if the system has to
> reclaim memory. In that sense they are similar to entries in any other cache.
> Those apps are kept alive for opportunistic performance improvements but
> those performance improvements will vary based on the memory requirements of
> individual workloads.
>
> - Problem
>
> Naturally, cached apps were dominant consumers of memory on the system.
> However, they were not significant consumers of swap even though they are
> good candidate for swap. Under investigation, swapping out only begins
> once the low zone watermark is hit and kswapd wakes up, but the overall
> allocation rate in the system might trip lmkd thresholds and cause a cached
> process to be killed(we measured performance swapping out vs. zapping the
> memory by killing a process. Unsurprisingly, zapping is 10x times faster
> even though we use zram which is much faster than real storage) so kill
> from lmkd will often satisfy the high zone watermark, resulting in very
> few pages actually being moved to swap.

It is not clear what exactly is the problem from the above para. IMO
low usage of swap is not the problem but rather global memory pressure
and the reactive response to it is the problem. Killing apps over swap
is preferred as you have noted zapping frees memory faster but it
indirectly increases cold start. Also swapping on allocation causes
latency issues for the app. So, a proactive mechanism is needed to
keep global pressure away and indirectly reduces cold starts and alloc
stalls.

>
> - Approach
>
> The approach we chose was to use a new interface to allow userspace to
> proactively reclaim entire processes by leveraging platform information.
> This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> that are known to be cold from userspace and to avoid races with lmkd
> by reclaiming apps as soon as they entered the cached state. Additionally,
> it could provide many chances for platform to use much information to
> optimize memory efficiency.

I think it would be good to have clear reasoning on why "reclaim from
userspace" approach is taken. Android runtime clearly has more
accurate stale/cold information at the app/process level and can
positively influence kernel's reclaim decisions. So, "reclaim from
userspace" approach makes total sense for Android. I envision that
Chrome OS would be another very obvious user of this approach. There
can be tens of tabs which the user have not touched for sometime.
Chrome OS can proactively reclaim memory from such tabs.

>
> IMHO we should spell it out that this patchset complements MADV_WONTNEED

MADV_DONTNEED? same at couple of places below.

> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.
>
> To achieve the goal, the patchset introduce two new options for madvise.
> One is MADV_COOL which will deactive activated pages and the other is
> MADV_COLD which will reclaim private pages instantly. These new options
> complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed when memory pressure rises.
>
> This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> information required to make the reclaim decision is not known to the app.
> Instead, it is known to a centralized userspace daemon, and that daemon
> must be able to initiate reclaim on its own without any app involvement.
> To solve the concern, this patch introduces new syscall -
>
> struct pr_madvise_param {
> int size;
> const struct iovec *vec;
> }
>
>   

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Oleksandr Natalenko
On Tue, May 21, 2019 at 02:04:00PM +0200, Christian Brauner wrote:
> On May 21, 2019 1:41:20 PM GMT+02:00, Minchan Kim  wrote:
> >On Tue, May 21, 2019 at 01:30:32PM +0200, Christian Brauner wrote:
> >> On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> >> > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> >> > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> >> > > > - Background
> >> > > > 
> >> > > > The Android terminology used for forking a new process and
> >starting an app
> >> > > > from scratch is a cold start, while resuming an existing app is
> >a hot start.
> >> > > > While we continually try to improve the performance of cold
> >starts, hot
> >> > > > starts will always be significantly less power hungry as well
> >as faster so
> >> > > > we are trying to make hot start more likely than cold start.
> >> > > > 
> >> > > > To increase hot start, Android userspace manages the order that
> >apps should
> >> > > > be killed in a process called ActivityManagerService.
> >ActivityManagerService
> >> > > > tracks every Android app or service that the user could be
> >interacting with
> >> > > > at any time and translates that into a ranked list for lmkd(low
> >memory
> >> > > > killer daemon). They are likely to be killed by lmkd if the
> >system has to
> >> > > > reclaim memory. In that sense they are similar to entries in
> >any other cache.
> >> > > > Those apps are kept alive for opportunistic performance
> >improvements but
> >> > > > those performance improvements will vary based on the memory
> >requirements of
> >> > > > individual workloads.
> >> > > > 
> >> > > > - Problem
> >> > > > 
> >> > > > Naturally, cached apps were dominant consumers of memory on the
> >system.
> >> > > > However, they were not significant consumers of swap even
> >though they are
> >> > > > good candidate for swap. Under investigation, swapping out only
> >begins
> >> > > > once the low zone watermark is hit and kswapd wakes up, but the
> >overall
> >> > > > allocation rate in the system might trip lmkd thresholds and
> >cause a cached
> >> > > > process to be killed(we measured performance swapping out vs.
> >zapping the
> >> > > > memory by killing a process. Unsurprisingly, zapping is 10x
> >times faster
> >> > > > even though we use zram which is much faster than real storage)
> >so kill
> >> > > > from lmkd will often satisfy the high zone watermark, resulting
> >in very
> >> > > > few pages actually being moved to swap.
> >> > > > 
> >> > > > - Approach
> >> > > > 
> >> > > > The approach we chose was to use a new interface to allow
> >userspace to
> >> > > > proactively reclaim entire processes by leveraging platform
> >information.
> >> > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs
> >for pages
> >> > > > that are known to be cold from userspace and to avoid races
> >with lmkd
> >> > > > by reclaiming apps as soon as they entered the cached state.
> >Additionally,
> >> > > > it could provide many chances for platform to use much
> >information to
> >> > > > optimize memory efficiency.
> >> > > > 
> >> > > > IMHO we should spell it out that this patchset complements
> >MADV_WONTNEED
> >> > > > and MADV_FREE by adding non-destructive ways to gain some free
> >memory
> >> > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it
> >hints the
> >> > > > kernel that memory region is not currently needed and should be
> >reclaimed
> >> > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it
> >hints the
> >> > > > kernel that memory region is not currently needed and should be
> >reclaimed
> >> > > > when memory pressure rises.
> >> > > > 
> >> > > > To achieve the goal, the patchset introduce two new options for
> >madvise.
> >> > > > One is MADV_COOL which will deactive activated pages and the
> >other is
> >> > > > MADV_COLD which will reclaim private pages instantly. These new
> >options
> >> > > > complement MADV_DONTNEED and MADV_FREE by adding
> >non-destructive ways to
> >> > > > gain some free memory space. MADV_COLD is similar to
> >MADV_DONTNEED in a way
> >> > > > that it hints the kernel that memory region is not currently
> >needed and
> >> > > > should be reclaimed immediately; MADV_COOL is similar to
> >MADV_FREE in a way
> >> > > > that it hints the kernel that memory region is not currently
> >needed and
> >> > > > should be reclaimed when memory pressure rises.
> >> > > > 
> >> > > > This approach is similar in spirit to madvise(MADV_WONTNEED),
> >but the
> >> > > > information required to make the reclaim decision is not known
> >to the app.
> >> > > > Instead, it is known to a centralized userspace daemon, and
> >that daemon
> >> > > > must be able to initiate reclaim on its own without any app
> >involvement.
> >> > > > To solve the concern, this patch introduces new syscall -
> >> > > > 
> >> > > >  struct pr_madvise_param {
> >> > > >  int size;
> >> > > >  const struct 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Christian Brauner
On May 21, 2019 1:41:20 PM GMT+02:00, Minchan Kim  wrote:
>On Tue, May 21, 2019 at 01:30:32PM +0200, Christian Brauner wrote:
>> On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
>> > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
>> > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
>> > > > - Background
>> > > > 
>> > > > The Android terminology used for forking a new process and
>starting an app
>> > > > from scratch is a cold start, while resuming an existing app is
>a hot start.
>> > > > While we continually try to improve the performance of cold
>starts, hot
>> > > > starts will always be significantly less power hungry as well
>as faster so
>> > > > we are trying to make hot start more likely than cold start.
>> > > > 
>> > > > To increase hot start, Android userspace manages the order that
>apps should
>> > > > be killed in a process called ActivityManagerService.
>ActivityManagerService
>> > > > tracks every Android app or service that the user could be
>interacting with
>> > > > at any time and translates that into a ranked list for lmkd(low
>memory
>> > > > killer daemon). They are likely to be killed by lmkd if the
>system has to
>> > > > reclaim memory. In that sense they are similar to entries in
>any other cache.
>> > > > Those apps are kept alive for opportunistic performance
>improvements but
>> > > > those performance improvements will vary based on the memory
>requirements of
>> > > > individual workloads.
>> > > > 
>> > > > - Problem
>> > > > 
>> > > > Naturally, cached apps were dominant consumers of memory on the
>system.
>> > > > However, they were not significant consumers of swap even
>though they are
>> > > > good candidate for swap. Under investigation, swapping out only
>begins
>> > > > once the low zone watermark is hit and kswapd wakes up, but the
>overall
>> > > > allocation rate in the system might trip lmkd thresholds and
>cause a cached
>> > > > process to be killed(we measured performance swapping out vs.
>zapping the
>> > > > memory by killing a process. Unsurprisingly, zapping is 10x
>times faster
>> > > > even though we use zram which is much faster than real storage)
>so kill
>> > > > from lmkd will often satisfy the high zone watermark, resulting
>in very
>> > > > few pages actually being moved to swap.
>> > > > 
>> > > > - Approach
>> > > > 
>> > > > The approach we chose was to use a new interface to allow
>userspace to
>> > > > proactively reclaim entire processes by leveraging platform
>information.
>> > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs
>for pages
>> > > > that are known to be cold from userspace and to avoid races
>with lmkd
>> > > > by reclaiming apps as soon as they entered the cached state.
>Additionally,
>> > > > it could provide many chances for platform to use much
>information to
>> > > > optimize memory efficiency.
>> > > > 
>> > > > IMHO we should spell it out that this patchset complements
>MADV_WONTNEED
>> > > > and MADV_FREE by adding non-destructive ways to gain some free
>memory
>> > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it
>hints the
>> > > > kernel that memory region is not currently needed and should be
>reclaimed
>> > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it
>hints the
>> > > > kernel that memory region is not currently needed and should be
>reclaimed
>> > > > when memory pressure rises.
>> > > > 
>> > > > To achieve the goal, the patchset introduce two new options for
>madvise.
>> > > > One is MADV_COOL which will deactive activated pages and the
>other is
>> > > > MADV_COLD which will reclaim private pages instantly. These new
>options
>> > > > complement MADV_DONTNEED and MADV_FREE by adding
>non-destructive ways to
>> > > > gain some free memory space. MADV_COLD is similar to
>MADV_DONTNEED in a way
>> > > > that it hints the kernel that memory region is not currently
>needed and
>> > > > should be reclaimed immediately; MADV_COOL is similar to
>MADV_FREE in a way
>> > > > that it hints the kernel that memory region is not currently
>needed and
>> > > > should be reclaimed when memory pressure rises.
>> > > > 
>> > > > This approach is similar in spirit to madvise(MADV_WONTNEED),
>but the
>> > > > information required to make the reclaim decision is not known
>to the app.
>> > > > Instead, it is known to a centralized userspace daemon, and
>that daemon
>> > > > must be able to initiate reclaim on its own without any app
>involvement.
>> > > > To solve the concern, this patch introduces new syscall -
>> > > > 
>> > > >struct pr_madvise_param {
>> > > >int size;
>> > > >const struct iovec *vec;
>> > > >}
>> > > > 
>> > > >int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
>> > > >struct pr_madvise_param *restuls,
>> > > >struct pr_madvise_param *ranges,
>> > > >

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Minchan Kim
On Tue, May 21, 2019 at 01:30:32PM +0200, Christian Brauner wrote:
> On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > > - Background
> > > > 
> > > > The Android terminology used for forking a new process and starting an 
> > > > app
> > > > from scratch is a cold start, while resuming an existing app is a hot 
> > > > start.
> > > > While we continually try to improve the performance of cold starts, hot
> > > > starts will always be significantly less power hungry as well as faster 
> > > > so
> > > > we are trying to make hot start more likely than cold start.
> > > > 
> > > > To increase hot start, Android userspace manages the order that apps 
> > > > should
> > > > be killed in a process called ActivityManagerService. 
> > > > ActivityManagerService
> > > > tracks every Android app or service that the user could be interacting 
> > > > with
> > > > at any time and translates that into a ranked list for lmkd(low memory
> > > > killer daemon). They are likely to be killed by lmkd if the system has 
> > > > to
> > > > reclaim memory. In that sense they are similar to entries in any other 
> > > > cache.
> > > > Those apps are kept alive for opportunistic performance improvements but
> > > > those performance improvements will vary based on the memory 
> > > > requirements of
> > > > individual workloads.
> > > > 
> > > > - Problem
> > > > 
> > > > Naturally, cached apps were dominant consumers of memory on the system.
> > > > However, they were not significant consumers of swap even though they 
> > > > are
> > > > good candidate for swap. Under investigation, swapping out only begins
> > > > once the low zone watermark is hit and kswapd wakes up, but the overall
> > > > allocation rate in the system might trip lmkd thresholds and cause a 
> > > > cached
> > > > process to be killed(we measured performance swapping out vs. zapping 
> > > > the
> > > > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > > > even though we use zram which is much faster than real storage) so kill
> > > > from lmkd will often satisfy the high zone watermark, resulting in very
> > > > few pages actually being moved to swap.
> > > > 
> > > > - Approach
> > > > 
> > > > The approach we chose was to use a new interface to allow userspace to
> > > > proactively reclaim entire processes by leveraging platform information.
> > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > > > that are known to be cold from userspace and to avoid races with lmkd
> > > > by reclaiming apps as soon as they entered the cached state. 
> > > > Additionally,
> > > > it could provide many chances for platform to use much information to
> > > > optimize memory efficiency.
> > > > 
> > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > > > and MADV_FREE by adding non-destructive ways to gain some free memory
> > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > > > kernel that memory region is not currently needed and should be 
> > > > reclaimed
> > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints 
> > > > the
> > > > kernel that memory region is not currently needed and should be 
> > > > reclaimed
> > > > when memory pressure rises.
> > > > 
> > > > To achieve the goal, the patchset introduce two new options for madvise.
> > > > One is MADV_COOL which will deactive activated pages and the other is
> > > > MADV_COLD which will reclaim private pages instantly. These new options
> > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a 
> > > > way
> > > > that it hints the kernel that memory region is not currently needed and
> > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a 
> > > > way
> > > > that it hints the kernel that memory region is not currently needed and
> > > > should be reclaimed when memory pressure rises.
> > > > 
> > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > > > information required to make the reclaim decision is not known to the 
> > > > app.
> > > > Instead, it is known to a centralized userspace daemon, and that daemon
> > > > must be able to initiate reclaim on its own without any app involvement.
> > > > To solve the concern, this patch introduces new syscall -
> > > > 
> > > > struct pr_madvise_param {
> > > > int size;
> > > > const struct iovec *vec;
> > > > }
> > > > 
> > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> > > > struct pr_madvise_param *restuls,
> > > > struct pr_madvise_param *ranges,
> > > >   

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Christian Brauner
On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote:
> On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > > - Background
> > > > 
> > > > The Android terminology used for forking a new process and starting an 
> > > > app
> > > > from scratch is a cold start, while resuming an existing app is a hot 
> > > > start.
> > > > While we continually try to improve the performance of cold starts, hot
> > > > starts will always be significantly less power hungry as well as faster 
> > > > so
> > > > we are trying to make hot start more likely than cold start.
> > > > 
> > > > To increase hot start, Android userspace manages the order that apps 
> > > > should
> > > > be killed in a process called ActivityManagerService. 
> > > > ActivityManagerService
> > > > tracks every Android app or service that the user could be interacting 
> > > > with
> > > > at any time and translates that into a ranked list for lmkd(low memory
> > > > killer daemon). They are likely to be killed by lmkd if the system has 
> > > > to
> > > > reclaim memory. In that sense they are similar to entries in any other 
> > > > cache.
> > > > Those apps are kept alive for opportunistic performance improvements but
> > > > those performance improvements will vary based on the memory 
> > > > requirements of
> > > > individual workloads.
> > > > 
> > > > - Problem
> > > > 
> > > > Naturally, cached apps were dominant consumers of memory on the system.
> > > > However, they were not significant consumers of swap even though they 
> > > > are
> > > > good candidate for swap. Under investigation, swapping out only begins
> > > > once the low zone watermark is hit and kswapd wakes up, but the overall
> > > > allocation rate in the system might trip lmkd thresholds and cause a 
> > > > cached
> > > > process to be killed(we measured performance swapping out vs. zapping 
> > > > the
> > > > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > > > even though we use zram which is much faster than real storage) so kill
> > > > from lmkd will often satisfy the high zone watermark, resulting in very
> > > > few pages actually being moved to swap.
> > > > 
> > > > - Approach
> > > > 
> > > > The approach we chose was to use a new interface to allow userspace to
> > > > proactively reclaim entire processes by leveraging platform information.
> > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > > > that are known to be cold from userspace and to avoid races with lmkd
> > > > by reclaiming apps as soon as they entered the cached state. 
> > > > Additionally,
> > > > it could provide many chances for platform to use much information to
> > > > optimize memory efficiency.
> > > > 
> > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > > > and MADV_FREE by adding non-destructive ways to gain some free memory
> > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > > > kernel that memory region is not currently needed and should be 
> > > > reclaimed
> > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints 
> > > > the
> > > > kernel that memory region is not currently needed and should be 
> > > > reclaimed
> > > > when memory pressure rises.
> > > > 
> > > > To achieve the goal, the patchset introduce two new options for madvise.
> > > > One is MADV_COOL which will deactive activated pages and the other is
> > > > MADV_COLD which will reclaim private pages instantly. These new options
> > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a 
> > > > way
> > > > that it hints the kernel that memory region is not currently needed and
> > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a 
> > > > way
> > > > that it hints the kernel that memory region is not currently needed and
> > > > should be reclaimed when memory pressure rises.
> > > > 
> > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > > > information required to make the reclaim decision is not known to the 
> > > > app.
> > > > Instead, it is known to a centralized userspace daemon, and that daemon
> > > > must be able to initiate reclaim on its own without any app involvement.
> > > > To solve the concern, this patch introduces new syscall -
> > > > 
> > > > struct pr_madvise_param {
> > > > int size;
> > > > const struct iovec *vec;
> > > > }
> > > > 
> > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> > > > struct pr_madvise_param *restuls,
> > > > struct pr_madvise_param *ranges,
> > > >   

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Christian Brauner
On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote:
> On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > - Background
> > > 
> > > The Android terminology used for forking a new process and starting an app
> > > from scratch is a cold start, while resuming an existing app is a hot 
> > > start.
> > > While we continually try to improve the performance of cold starts, hot
> > > starts will always be significantly less power hungry as well as faster so
> > > we are trying to make hot start more likely than cold start.
> > > 
> > > To increase hot start, Android userspace manages the order that apps 
> > > should
> > > be killed in a process called ActivityManagerService. 
> > > ActivityManagerService
> > > tracks every Android app or service that the user could be interacting 
> > > with
> > > at any time and translates that into a ranked list for lmkd(low memory
> > > killer daemon). They are likely to be killed by lmkd if the system has to
> > > reclaim memory. In that sense they are similar to entries in any other 
> > > cache.
> > > Those apps are kept alive for opportunistic performance improvements but
> > > those performance improvements will vary based on the memory requirements 
> > > of
> > > individual workloads.
> > > 
> > > - Problem
> > > 
> > > Naturally, cached apps were dominant consumers of memory on the system.
> > > However, they were not significant consumers of swap even though they are
> > > good candidate for swap. Under investigation, swapping out only begins
> > > once the low zone watermark is hit and kswapd wakes up, but the overall
> > > allocation rate in the system might trip lmkd thresholds and cause a 
> > > cached
> > > process to be killed(we measured performance swapping out vs. zapping the
> > > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > > even though we use zram which is much faster than real storage) so kill
> > > from lmkd will often satisfy the high zone watermark, resulting in very
> > > few pages actually being moved to swap.
> > > 
> > > - Approach
> > > 
> > > The approach we chose was to use a new interface to allow userspace to
> > > proactively reclaim entire processes by leveraging platform information.
> > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > > that are known to be cold from userspace and to avoid races with lmkd
> > > by reclaiming apps as soon as they entered the cached state. Additionally,
> > > it could provide many chances for platform to use much information to
> > > optimize memory efficiency.
> > > 
> > > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > > and MADV_FREE by adding non-destructive ways to gain some free memory
> > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > > kernel that memory region is not currently needed and should be reclaimed
> > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > > kernel that memory region is not currently needed and should be reclaimed
> > > when memory pressure rises.
> > > 
> > > To achieve the goal, the patchset introduce two new options for madvise.
> > > One is MADV_COOL which will deactive activated pages and the other is
> > > MADV_COLD which will reclaim private pages instantly. These new options
> > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a 
> > > way
> > > that it hints the kernel that memory region is not currently needed and
> > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a 
> > > way
> > > that it hints the kernel that memory region is not currently needed and
> > > should be reclaimed when memory pressure rises.
> > > 
> > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > > information required to make the reclaim decision is not known to the app.
> > > Instead, it is known to a centralized userspace daemon, and that daemon
> > > must be able to initiate reclaim on its own without any app involvement.
> > > To solve the concern, this patch introduces new syscall -
> > > 
> > >   struct pr_madvise_param {
> > >   int size;
> > >   const struct iovec *vec;
> > >   }
> > > 
> > >   int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> > >   struct pr_madvise_param *restuls,
> > >   struct pr_madvise_param *ranges,
> > >   unsigned long flags);
> > > 
> > > The syscall get pidfd to give hints to external process and provides
> > > pair of result/ranges vector arguments so that it could give several
> > > hints to each address range all at once.
> > > 
> > > I guess others have different ideas about the naming of syscall and 
> > > options
> > > so feel free to suggest better naming.
> > 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Minchan Kim
On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote:
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > - Background
> > 
> > The Android terminology used for forking a new process and starting an app
> > from scratch is a cold start, while resuming an existing app is a hot start.
> > While we continually try to improve the performance of cold starts, hot
> > starts will always be significantly less power hungry as well as faster so
> > we are trying to make hot start more likely than cold start.
> > 
> > To increase hot start, Android userspace manages the order that apps should
> > be killed in a process called ActivityManagerService. ActivityManagerService
> > tracks every Android app or service that the user could be interacting with
> > at any time and translates that into a ranked list for lmkd(low memory
> > killer daemon). They are likely to be killed by lmkd if the system has to
> > reclaim memory. In that sense they are similar to entries in any other 
> > cache.
> > Those apps are kept alive for opportunistic performance improvements but
> > those performance improvements will vary based on the memory requirements of
> > individual workloads.
> > 
> > - Problem
> > 
> > Naturally, cached apps were dominant consumers of memory on the system.
> > However, they were not significant consumers of swap even though they are
> > good candidate for swap. Under investigation, swapping out only begins
> > once the low zone watermark is hit and kswapd wakes up, but the overall
> > allocation rate in the system might trip lmkd thresholds and cause a cached
> > process to be killed(we measured performance swapping out vs. zapping the
> > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > even though we use zram which is much faster than real storage) so kill
> > from lmkd will often satisfy the high zone watermark, resulting in very
> > few pages actually being moved to swap.
> > 
> > - Approach
> > 
> > The approach we chose was to use a new interface to allow userspace to
> > proactively reclaim entire processes by leveraging platform information.
> > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > that are known to be cold from userspace and to avoid races with lmkd
> > by reclaiming apps as soon as they entered the cached state. Additionally,
> > it could provide many chances for platform to use much information to
> > optimize memory efficiency.
> > 
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> > 
> > To achieve the goal, the patchset introduce two new options for madvise.
> > One is MADV_COOL which will deactive activated pages and the other is
> > MADV_COLD which will reclaim private pages instantly. These new options
> > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> > that it hints the kernel that memory region is not currently needed and
> > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> > that it hints the kernel that memory region is not currently needed and
> > should be reclaimed when memory pressure rises.
> > 
> > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > information required to make the reclaim decision is not known to the app.
> > Instead, it is known to a centralized userspace daemon, and that daemon
> > must be able to initiate reclaim on its own without any app involvement.
> > To solve the concern, this patch introduces new syscall -
> > 
> > struct pr_madvise_param {
> > int size;
> > const struct iovec *vec;
> > }
> > 
> > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> > struct pr_madvise_param *restuls,
> > struct pr_madvise_param *ranges,
> > unsigned long flags);
> > 
> > The syscall get pidfd to give hints to external process and provides
> > pair of result/ranges vector arguments so that it could give several
> > hints to each address range all at once.
> > 
> > I guess others have different ideas about the naming of syscall and options
> > so feel free to suggest better naming.
> 
> Yes, all new syscalls making use of pidfds should be named
> pidfd_. So please make this pidfd_madvise.

I don't have any particular preference but just wondering why pidfd is
so special to have it as prefix of system call name.

> 
> Please make sure to Cc me on this in 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Michal Hocko
On Tue 21-05-19 08:25:55, Anshuman Khandual wrote:
> On 05/20/2019 10:29 PM, Tim Murray wrote:
[...]
> > not seem to introduce a noticeable hot start penalty, not does it
> > cause an increase in performance problems later in the app's
> > lifecycle. I've measured with and without process_madvise, and the
> > differences are within our noise bounds. Second, because we're not
> 
> That is assuming that post process_madvise() working set for the application 
> is
> always smaller. There is another challenge. The external process should 
> ideally
> have the knowledge of active areas of the working set for an application in
> question for it to invoke process_madvise() correctly to prevent such 
> scenarios.

But that doesn't really seem relevant for the API itself, right? The
higher level logic the monitor's business.
-- 
Michal Hocko
SUSE Labs


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Christian Brauner
On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> - Background
> 
> The Android terminology used for forking a new process and starting an app
> from scratch is a cold start, while resuming an existing app is a hot start.
> While we continually try to improve the performance of cold starts, hot
> starts will always be significantly less power hungry as well as faster so
> we are trying to make hot start more likely than cold start.
> 
> To increase hot start, Android userspace manages the order that apps should
> be killed in a process called ActivityManagerService. ActivityManagerService
> tracks every Android app or service that the user could be interacting with
> at any time and translates that into a ranked list for lmkd(low memory
> killer daemon). They are likely to be killed by lmkd if the system has to
> reclaim memory. In that sense they are similar to entries in any other cache.
> Those apps are kept alive for opportunistic performance improvements but
> those performance improvements will vary based on the memory requirements of
> individual workloads.
> 
> - Problem
> 
> Naturally, cached apps were dominant consumers of memory on the system.
> However, they were not significant consumers of swap even though they are
> good candidate for swap. Under investigation, swapping out only begins
> once the low zone watermark is hit and kswapd wakes up, but the overall
> allocation rate in the system might trip lmkd thresholds and cause a cached
> process to be killed(we measured performance swapping out vs. zapping the
> memory by killing a process. Unsurprisingly, zapping is 10x times faster
> even though we use zram which is much faster than real storage) so kill
> from lmkd will often satisfy the high zone watermark, resulting in very
> few pages actually being moved to swap.
> 
> - Approach
> 
> The approach we chose was to use a new interface to allow userspace to
> proactively reclaim entire processes by leveraging platform information.
> This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> that are known to be cold from userspace and to avoid races with lmkd
> by reclaiming apps as soon as they entered the cached state. Additionally,
> it could provide many chances for platform to use much information to
> optimize memory efficiency.
> 
> IMHO we should spell it out that this patchset complements MADV_WONTNEED
> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.
> 
> To achieve the goal, the patchset introduce two new options for madvise.
> One is MADV_COOL which will deactive activated pages and the other is
> MADV_COLD which will reclaim private pages instantly. These new options
> complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed when memory pressure rises.
> 
> This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> information required to make the reclaim decision is not known to the app.
> Instead, it is known to a centralized userspace daemon, and that daemon
> must be able to initiate reclaim on its own without any app involvement.
> To solve the concern, this patch introduces new syscall -
> 
>   struct pr_madvise_param {
>   int size;
>   const struct iovec *vec;
>   }
> 
>   int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
>   struct pr_madvise_param *restuls,
>   struct pr_madvise_param *ranges,
>   unsigned long flags);
> 
> The syscall get pidfd to give hints to external process and provides
> pair of result/ranges vector arguments so that it could give several
> hints to each address range all at once.
> 
> I guess others have different ideas about the naming of syscall and options
> so feel free to suggest better naming.

Yes, all new syscalls making use of pidfds should be named
pidfd_. So please make this pidfd_madvise.

Please make sure to Cc me on this in the future as I'm maintaining
pidfds. Would be great to have Jann on this too since he's been touching
both mm and parts of the pidfd stuff with me.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Michal Hocko
[linux-api]

On Mon 20-05-19 18:44:52, Matthew Wilcox wrote:
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> 
> Do we tear down page tables for these ranges?  That seems like a good
> way of reclaiming potentially a substantial amount of memory.

I do not think we can in general because this is a non-destructive
operation. So at least we cannot tear down anonymous ptes (they will
turn into swap entries).

-- 
Michal Hocko
SUSE Labs


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-21 Thread Michal Hocko
[Cc linux-api]

On Tue 21-05-19 13:39:50, Minchan Kim wrote:
> On Mon, May 20, 2019 at 12:46:05PM -0400, Johannes Weiner wrote:
> > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > > - Approach
> > > 
> > > The approach we chose was to use a new interface to allow userspace to
> > > proactively reclaim entire processes by leveraging platform information.
> > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > > that are known to be cold from userspace and to avoid races with lmkd
> > > by reclaiming apps as soon as they entered the cached state. Additionally,
> > > it could provide many chances for platform to use much information to
> > > optimize memory efficiency.
> > > 
> > > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > > and MADV_FREE by adding non-destructive ways to gain some free memory
> > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > > kernel that memory region is not currently needed and should be reclaimed
> > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > > kernel that memory region is not currently needed and should be reclaimed
> > > when memory pressure rises.
> > 
> > I agree with this approach and the semantics. But these names are very
> > vague and extremely easy to confuse since they're so similar.
> > 
> > MADV_COLD could be a good name, but for deactivating pages, not
> > reclaiming them - marking memory "cold" on the LRU for later reclaim.
> > 
> > For the immediate reclaim one, I think there is a better option too:
> > In virtual memory speak, putting a page into secondary storage (or
> > ensuring it's already there), and then freeing its in-memory copy, is
> > called "paging out". And that's what this flag is supposed to do. So
> > how about MADV_PAGEOUT?
> > 
> > With that, we'd have:
> > 
> > MADV_FREE: Mark data invalid, free memory when needed
> > MADV_DONTNEED: Mark data invalid, free memory immediately
> > 
> > MADV_COLD: Data is not used for a while, free memory when needed
> > MADV_PAGEOUT: Data is not used for a while, free memory immediately
> > 
> > What do you think?
> 
> There are several suggestions until now. Thanks, Folks!
> 
> For deactivating:
> 
> - MADV_COOL
> - MADV_RECLAIM_LAZY
> - MADV_DEACTIVATE
> - MADV_COLD
> - MADV_FREE_PRESERVE
> 
> 
> For reclaiming:
> 
> - MADV_COLD
> - MADV_RECLAIM_NOW
> - MADV_RECLAIMING
> - MADV_PAGEOUT
> - MADV_DONTNEED_PRESERVE
> 
> It seems everybody doesn't like MADV_COLD so want to go with other.
> For consisteny of view with other existing hints of madvise, -preserve
> postfix suits well. However, originally, I don't like the naming FREE
> vs DONTNEED from the beginning. They were easily confused.
> I prefer PAGEOUT to RECLAIM since it's more likely to be nuance to
> represent reclaim with memory pressure and is supposed to paged-in
> if someone need it later. So, it imply PRESERVE.
> If there is not strong against it, I want to go with MADV_COLD and
> MADV_PAGEOUT.
> 
> Other opinion?

I do not really care strongly. I am pretty sure we will have a lot of
suggestions because people tend to be good at arguing about that...
Anyway, unlike DONTNEED/FREE we do not have any other OS to implement
these features, right? So we shouldn't be tight to existing names.
On the other hand I kinda like the reference to the existing names but
DEACTIVATE/PAGEOUT seem a good fit to me as well. Unless there is way
much better name suggested I would go with one of those. Up to you.
-- 
Michal Hocko
SUSE Labs


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim
On Tue, May 21, 2019 at 08:25:55AM +0530, Anshuman Khandual wrote:
> 
> 
> On 05/20/2019 10:29 PM, Tim Murray wrote:
> > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
> >  wrote:
> >>
> >> Or Is the objective here is reduce the number of processes which get 
> >> killed by
> >> lmkd by triggering swapping for the unused memory (user hinted) sooner so 
> >> that
> >> they dont get picked by lmkd. Under utilization for zram hardware is a 
> >> concern
> >> here as well ?
> > 
> > The objective is to avoid some instances of memory pressure by
> > proactively swapping pages that userspace knows to be cold before
> > those pages reach the end of the LRUs, which in turn can prevent some
> > apps from being killed by lmk/lmkd. As soon as Android userspace knows
> > that an application is not being used and is only resident to improve
> > performance if the user returns to that app, we can kick off
> > process_madvise on that process's pages (or some portion of those
> > pages) in a power-efficient way to reduce memory pressure long before
> > the system hits the free page watermark. This allows the system more
> > time to put pages into zram versus waiting for the watermark to
> > trigger kswapd, which decreases the likelihood that later memory
> > allocations will cause enough pressure to trigger a kill of one of
> > these apps.
> 
> So this opens up bit of LRU management to user space hints. Also because the 
> app
> in itself wont know about the memory situation of the entire system, new 
> system
> call needs to be called from an external process.

That's why process_madvise is introduced here.

> 
> > 
> >> Swapping out memory into zram wont increase the latency for a hot start ? 
> >> Or
> >> is it because as it will prevent a fresh cold start which anyway will be 
> >> slower
> >> than a slow hot start. Just being curious.
> > 
> > First, not all swapped pages will be reloaded immediately once an app
> > is resumed. We've found that an app's working set post-process_madvise
> > is significantly smaller than what an app allocates when it first
> > launches (see the delta between pswpin and pswpout in Minchan's
> > results). Presumably because of this, faulting to fetch from zram does
> 
> pswpin  4176131392647 975034 233.00
> pswpout127422426617311387507 108.00
> 
> IIUC the swap-in ratio is way higher in comparison to that of swap out. Is 
> that
> always the case ? Or it tend to swap out from an active area of the working 
> set
> which faulted back again.

I think it's because apps are alive longer via reducing being killed
so turn into from pgpgin to swapin.

> 
> > not seem to introduce a noticeable hot start penalty, not does it
> > cause an increase in performance problems later in the app's
> > lifecycle. I've measured with and without process_madvise, and the
> > differences are within our noise bounds. Second, because we're not
> 
> That is assuming that post process_madvise() working set for the application 
> is
> always smaller. There is another challenge. The external process should 
> ideally
> have the knowledge of active areas of the working set for an application in
> question for it to invoke process_madvise() correctly to prevent such 
> scenarios.

There are several ways to detect workingset more accurately at the cost
of runtime. For example, with idle page tracking or clear_refs. Accuracy
is always trade-off of overhead for LRU aging.

> 
> > preemptively evicting file pages and only making them more likely to
> > be evicted when there's already memory pressure, we avoid the case
> > where we process_madvise an app then immediately return to the app and
> > reload all file pages in the working set even though there was no
> > intervening memory pressure. Our initial version of this work evicted
> 
> That would be the worst case scenario which should be avoided. Memory pressure
> must be a parameter before actually doing the swap out. But pages if know to 
> be
> inactive/cold can be marked high priority to be swapped out.
> 
> > file pages preemptively and did cause a noticeable slowdown (~15%) for
> > that case; this patch set avoids that slowdown. Finally, the benefit
> > from avoiding cold starts is huge. The performance improvement from
> > having a hot start instead of a cold start ranges from 3x for very
> > small apps to 50x+ for larger apps like high-fidelity games.
> 
> Is there any other real world scenario apart from this app based ecosystem 
> where
> user hinted LRU management might be helpful ? Just being curious. Thanks for 
> the
> detailed explanation. I will continue looking into this series.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim
On Mon, May 20, 2019 at 06:44:52PM -0700, Matthew Wilcox wrote:
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> 
> Do we tear down page tables for these ranges?  That seems like a good

True for MADV_COLD(reclaiming) but false for MADV_COOL(deactivating) at
this implementation.

> way of reclaiming potentially a substantial amount of memory.

Given that consider refauting are spread out over time and reclaim occurs
in burst, that does make sense to speed up the reclaiming. However, a
concern to me is anonymous pages since they need swap cache insertion,
which would be wasteful if they are not reclaimed, finally.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim
On Mon, May 20, 2019 at 12:46:05PM -0400, Johannes Weiner wrote:
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > - Approach
> > 
> > The approach we chose was to use a new interface to allow userspace to
> > proactively reclaim entire processes by leveraging platform information.
> > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > that are known to be cold from userspace and to avoid races with lmkd
> > by reclaiming apps as soon as they entered the cached state. Additionally,
> > it could provide many chances for platform to use much information to
> > optimize memory efficiency.
> > 
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> 
> I agree with this approach and the semantics. But these names are very
> vague and extremely easy to confuse since they're so similar.
> 
> MADV_COLD could be a good name, but for deactivating pages, not
> reclaiming them - marking memory "cold" on the LRU for later reclaim.
> 
> For the immediate reclaim one, I think there is a better option too:
> In virtual memory speak, putting a page into secondary storage (or
> ensuring it's already there), and then freeing its in-memory copy, is
> called "paging out". And that's what this flag is supposed to do. So
> how about MADV_PAGEOUT?
> 
> With that, we'd have:
> 
> MADV_FREE: Mark data invalid, free memory when needed
> MADV_DONTNEED: Mark data invalid, free memory immediately
> 
> MADV_COLD: Data is not used for a while, free memory when needed
> MADV_PAGEOUT: Data is not used for a while, free memory immediately
> 
> What do you think?

There are several suggestions until now. Thanks, Folks!

For deactivating:

- MADV_COOL
- MADV_RECLAIM_LAZY
- MADV_DEACTIVATE
- MADV_COLD
- MADV_FREE_PRESERVE


For reclaiming:

- MADV_COLD
- MADV_RECLAIM_NOW
- MADV_RECLAIMING
- MADV_PAGEOUT
- MADV_DONTNEED_PRESERVE

It seems everybody doesn't like MADV_COLD so want to go with other.
For consisteny of view with other existing hints of madvise, -preserve
postfix suits well. However, originally, I don't like the naming FREE
vs DONTNEED from the beginning. They were easily confused.
I prefer PAGEOUT to RECLAIM since it's more likely to be nuance to
represent reclaim with memory pressure and is supposed to paged-in
if someone need it later. So, it imply PRESERVE.
If there is not strong against it, I want to go with MADV_COLD and
MADV_PAGEOUT.

Other opinion?



Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim
On Mon, May 20, 2019 at 04:42:00PM +0200, Oleksandr Natalenko wrote:
> Hi.
> 
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > - Background
> > 
> > The Android terminology used for forking a new process and starting an app
> > from scratch is a cold start, while resuming an existing app is a hot start.
> > While we continually try to improve the performance of cold starts, hot
> > starts will always be significantly less power hungry as well as faster so
> > we are trying to make hot start more likely than cold start.
> > 
> > To increase hot start, Android userspace manages the order that apps should
> > be killed in a process called ActivityManagerService. ActivityManagerService
> > tracks every Android app or service that the user could be interacting with
> > at any time and translates that into a ranked list for lmkd(low memory
> > killer daemon). They are likely to be killed by lmkd if the system has to
> > reclaim memory. In that sense they are similar to entries in any other 
> > cache.
> > Those apps are kept alive for opportunistic performance improvements but
> > those performance improvements will vary based on the memory requirements of
> > individual workloads.
> > 
> > - Problem
> > 
> > Naturally, cached apps were dominant consumers of memory on the system.
> > However, they were not significant consumers of swap even though they are
> > good candidate for swap. Under investigation, swapping out only begins
> > once the low zone watermark is hit and kswapd wakes up, but the overall
> > allocation rate in the system might trip lmkd thresholds and cause a cached
> > process to be killed(we measured performance swapping out vs. zapping the
> > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > even though we use zram which is much faster than real storage) so kill
> > from lmkd will often satisfy the high zone watermark, resulting in very
> > few pages actually being moved to swap.
> > 
> > - Approach
> > 
> > The approach we chose was to use a new interface to allow userspace to
> > proactively reclaim entire processes by leveraging platform information.
> > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > that are known to be cold from userspace and to avoid races with lmkd
> > by reclaiming apps as soon as they entered the cached state. Additionally,
> > it could provide many chances for platform to use much information to
> > optimize memory efficiency.
> > 
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> > 
> > To achieve the goal, the patchset introduce two new options for madvise.
> > One is MADV_COOL which will deactive activated pages and the other is
> > MADV_COLD which will reclaim private pages instantly. These new options
> > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> > that it hints the kernel that memory region is not currently needed and
> > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> > that it hints the kernel that memory region is not currently needed and
> > should be reclaimed when memory pressure rises.
> > 
> > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > information required to make the reclaim decision is not known to the app.
> > Instead, it is known to a centralized userspace daemon, and that daemon
> > must be able to initiate reclaim on its own without any app involvement.
> > To solve the concern, this patch introduces new syscall -
> > 
> > struct pr_madvise_param {
> > int size;
> > const struct iovec *vec;
> > }
> > 
> > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> > struct pr_madvise_param *restuls,
> > struct pr_madvise_param *ranges,
> > unsigned long flags);
> > 
> > The syscall get pidfd to give hints to external process and provides
> > pair of result/ranges vector arguments so that it could give several
> > hints to each address range all at once.
> > 
> > I guess others have different ideas about the naming of syscall and options
> > so feel free to suggest better naming.
> > 
> > - Experiment
> > 
> > We did bunch of testing with several hundreds of real users, not artificial
> > benchmark on android. We saw about 17% cold start decreasement without any
> > significant battery/app startup latency issues. And with artificial 
> > 

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Anshuman Khandual



On 05/20/2019 10:29 PM, Tim Murray wrote:
> On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
>  wrote:
>>
>> Or Is the objective here is reduce the number of processes which get killed 
>> by
>> lmkd by triggering swapping for the unused memory (user hinted) sooner so 
>> that
>> they dont get picked by lmkd. Under utilization for zram hardware is a 
>> concern
>> here as well ?
> 
> The objective is to avoid some instances of memory pressure by
> proactively swapping pages that userspace knows to be cold before
> those pages reach the end of the LRUs, which in turn can prevent some
> apps from being killed by lmk/lmkd. As soon as Android userspace knows
> that an application is not being used and is only resident to improve
> performance if the user returns to that app, we can kick off
> process_madvise on that process's pages (or some portion of those
> pages) in a power-efficient way to reduce memory pressure long before
> the system hits the free page watermark. This allows the system more
> time to put pages into zram versus waiting for the watermark to
> trigger kswapd, which decreases the likelihood that later memory
> allocations will cause enough pressure to trigger a kill of one of
> these apps.

So this opens up bit of LRU management to user space hints. Also because the app
in itself wont know about the memory situation of the entire system, new system
call needs to be called from an external process.

> 
>> Swapping out memory into zram wont increase the latency for a hot start ? Or
>> is it because as it will prevent a fresh cold start which anyway will be 
>> slower
>> than a slow hot start. Just being curious.
> 
> First, not all swapped pages will be reloaded immediately once an app
> is resumed. We've found that an app's working set post-process_madvise
> is significantly smaller than what an app allocates when it first
> launches (see the delta between pswpin and pswpout in Minchan's
> results). Presumably because of this, faulting to fetch from zram does

pswpin  4176131392647 975034 233.00
pswpout127422426617311387507 108.00

IIUC the swap-in ratio is way higher in comparison to that of swap out. Is that
always the case ? Or it tend to swap out from an active area of the working set
which faulted back again.

> not seem to introduce a noticeable hot start penalty, not does it
> cause an increase in performance problems later in the app's
> lifecycle. I've measured with and without process_madvise, and the
> differences are within our noise bounds. Second, because we're not

That is assuming that post process_madvise() working set for the application is
always smaller. There is another challenge. The external process should ideally
have the knowledge of active areas of the working set for an application in
question for it to invoke process_madvise() correctly to prevent such scenarios.

> preemptively evicting file pages and only making them more likely to
> be evicted when there's already memory pressure, we avoid the case
> where we process_madvise an app then immediately return to the app and
> reload all file pages in the working set even though there was no
> intervening memory pressure. Our initial version of this work evicted

That would be the worst case scenario which should be avoided. Memory pressure
must be a parameter before actually doing the swap out. But pages if know to be
inactive/cold can be marked high priority to be swapped out.

> file pages preemptively and did cause a noticeable slowdown (~15%) for
> that case; this patch set avoids that slowdown. Finally, the benefit
> from avoiding cold starts is huge. The performance improvement from
> having a hot start instead of a cold start ranges from 3x for very
> small apps to 50x+ for larger apps like high-fidelity games.

Is there any other real world scenario apart from this app based ecosystem where
user hinted LRU management might be helpful ? Just being curious. Thanks for the
detailed explanation. I will continue looking into this series.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Matthew Wilcox
On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> IMHO we should spell it out that this patchset complements MADV_WONTNEED
> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.

Do we tear down page tables for these ranges?  That seems like a good
way of reclaiming potentially a substantial amount of memory.



Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Tim Murray
On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
 wrote:
>
> Or Is the objective here is reduce the number of processes which get killed by
> lmkd by triggering swapping for the unused memory (user hinted) sooner so that
> they dont get picked by lmkd. Under utilization for zram hardware is a concern
> here as well ?

The objective is to avoid some instances of memory pressure by
proactively swapping pages that userspace knows to be cold before
those pages reach the end of the LRUs, which in turn can prevent some
apps from being killed by lmk/lmkd. As soon as Android userspace knows
that an application is not being used and is only resident to improve
performance if the user returns to that app, we can kick off
process_madvise on that process's pages (or some portion of those
pages) in a power-efficient way to reduce memory pressure long before
the system hits the free page watermark. This allows the system more
time to put pages into zram versus waiting for the watermark to
trigger kswapd, which decreases the likelihood that later memory
allocations will cause enough pressure to trigger a kill of one of
these apps.

> Swapping out memory into zram wont increase the latency for a hot start ? Or
> is it because as it will prevent a fresh cold start which anyway will be 
> slower
> than a slow hot start. Just being curious.

First, not all swapped pages will be reloaded immediately once an app
is resumed. We've found that an app's working set post-process_madvise
is significantly smaller than what an app allocates when it first
launches (see the delta between pswpin and pswpout in Minchan's
results). Presumably because of this, faulting to fetch from zram does
not seem to introduce a noticeable hot start penalty, not does it
cause an increase in performance problems later in the app's
lifecycle. I've measured with and without process_madvise, and the
differences are within our noise bounds. Second, because we're not
preemptively evicting file pages and only making them more likely to
be evicted when there's already memory pressure, we avoid the case
where we process_madvise an app then immediately return to the app and
reload all file pages in the working set even though there was no
intervening memory pressure. Our initial version of this work evicted
file pages preemptively and did cause a noticeable slowdown (~15%) for
that case; this patch set avoids that slowdown. Finally, the benefit
from avoiding cold starts is huge. The performance improvement from
having a hot start instead of a cold start ranges from 3x for very
small apps to 50x+ for larger apps like high-fidelity games.


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Johannes Weiner
On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> - Approach
> 
> The approach we chose was to use a new interface to allow userspace to
> proactively reclaim entire processes by leveraging platform information.
> This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> that are known to be cold from userspace and to avoid races with lmkd
> by reclaiming apps as soon as they entered the cached state. Additionally,
> it could provide many chances for platform to use much information to
> optimize memory efficiency.
> 
> IMHO we should spell it out that this patchset complements MADV_WONTNEED
> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.

I agree with this approach and the semantics. But these names are very
vague and extremely easy to confuse since they're so similar.

MADV_COLD could be a good name, but for deactivating pages, not
reclaiming them - marking memory "cold" on the LRU for later reclaim.

For the immediate reclaim one, I think there is a better option too:
In virtual memory speak, putting a page into secondary storage (or
ensuring it's already there), and then freeing its in-memory copy, is
called "paging out". And that's what this flag is supposed to do. So
how about MADV_PAGEOUT?

With that, we'd have:

MADV_FREE: Mark data invalid, free memory when needed
MADV_DONTNEED: Mark data invalid, free memory immediately

MADV_COLD: Data is not used for a while, free memory when needed
MADV_PAGEOUT: Data is not used for a while, free memory immediately

What do you think?


Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Oleksandr Natalenko
Hi.

On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> - Background
> 
> The Android terminology used for forking a new process and starting an app
> from scratch is a cold start, while resuming an existing app is a hot start.
> While we continually try to improve the performance of cold starts, hot
> starts will always be significantly less power hungry as well as faster so
> we are trying to make hot start more likely than cold start.
> 
> To increase hot start, Android userspace manages the order that apps should
> be killed in a process called ActivityManagerService. ActivityManagerService
> tracks every Android app or service that the user could be interacting with
> at any time and translates that into a ranked list for lmkd(low memory
> killer daemon). They are likely to be killed by lmkd if the system has to
> reclaim memory. In that sense they are similar to entries in any other cache.
> Those apps are kept alive for opportunistic performance improvements but
> those performance improvements will vary based on the memory requirements of
> individual workloads.
> 
> - Problem
> 
> Naturally, cached apps were dominant consumers of memory on the system.
> However, they were not significant consumers of swap even though they are
> good candidate for swap. Under investigation, swapping out only begins
> once the low zone watermark is hit and kswapd wakes up, but the overall
> allocation rate in the system might trip lmkd thresholds and cause a cached
> process to be killed(we measured performance swapping out vs. zapping the
> memory by killing a process. Unsurprisingly, zapping is 10x times faster
> even though we use zram which is much faster than real storage) so kill
> from lmkd will often satisfy the high zone watermark, resulting in very
> few pages actually being moved to swap.
> 
> - Approach
> 
> The approach we chose was to use a new interface to allow userspace to
> proactively reclaim entire processes by leveraging platform information.
> This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> that are known to be cold from userspace and to avoid races with lmkd
> by reclaiming apps as soon as they entered the cached state. Additionally,
> it could provide many chances for platform to use much information to
> optimize memory efficiency.
> 
> IMHO we should spell it out that this patchset complements MADV_WONTNEED
> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.
> 
> To achieve the goal, the patchset introduce two new options for madvise.
> One is MADV_COOL which will deactive activated pages and the other is
> MADV_COLD which will reclaim private pages instantly. These new options
> complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed when memory pressure rises.
> 
> This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> information required to make the reclaim decision is not known to the app.
> Instead, it is known to a centralized userspace daemon, and that daemon
> must be able to initiate reclaim on its own without any app involvement.
> To solve the concern, this patch introduces new syscall -
> 
>   struct pr_madvise_param {
>   int size;
>   const struct iovec *vec;
>   }
> 
>   int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
>   struct pr_madvise_param *restuls,
>   struct pr_madvise_param *ranges,
>   unsigned long flags);
> 
> The syscall get pidfd to give hints to external process and provides
> pair of result/ranges vector arguments so that it could give several
> hints to each address range all at once.
> 
> I guess others have different ideas about the naming of syscall and options
> so feel free to suggest better naming.
> 
> - Experiment
> 
> We did bunch of testing with several hundreds of real users, not artificial
> benchmark on android. We saw about 17% cold start decreasement without any
> significant battery/app startup latency issues. And with artificial benchmark
> which launches and switching apps, we saw average 7% app launching 
> improvement,
> 18% less lmkd kill and good stat from vmstat.
> 
> A is vanilla and B is process_madvise.
> 
> 
>A  B   

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Michal Hocko
[Cc linux-api]

On Mon 20-05-19 12:52:47, Minchan Kim wrote:
> - Background
> 
> The Android terminology used for forking a new process and starting an app
> from scratch is a cold start, while resuming an existing app is a hot start.
> While we continually try to improve the performance of cold starts, hot
> starts will always be significantly less power hungry as well as faster so
> we are trying to make hot start more likely than cold start.
> 
> To increase hot start, Android userspace manages the order that apps should
> be killed in a process called ActivityManagerService. ActivityManagerService
> tracks every Android app or service that the user could be interacting with
> at any time and translates that into a ranked list for lmkd(low memory
> killer daemon). They are likely to be killed by lmkd if the system has to
> reclaim memory. In that sense they are similar to entries in any other cache.
> Those apps are kept alive for opportunistic performance improvements but
> those performance improvements will vary based on the memory requirements of
> individual workloads.
> 
> - Problem
> 
> Naturally, cached apps were dominant consumers of memory on the system.
> However, they were not significant consumers of swap even though they are
> good candidate for swap. Under investigation, swapping out only begins
> once the low zone watermark is hit and kswapd wakes up, but the overall
> allocation rate in the system might trip lmkd thresholds and cause a cached
> process to be killed(we measured performance swapping out vs. zapping the
> memory by killing a process. Unsurprisingly, zapping is 10x times faster
> even though we use zram which is much faster than real storage) so kill
> from lmkd will often satisfy the high zone watermark, resulting in very
> few pages actually being moved to swap.
> 
> - Approach
> 
> The approach we chose was to use a new interface to allow userspace to
> proactively reclaim entire processes by leveraging platform information.
> This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> that are known to be cold from userspace and to avoid races with lmkd
> by reclaiming apps as soon as they entered the cached state. Additionally,
> it could provide many chances for platform to use much information to
> optimize memory efficiency.
> 
> IMHO we should spell it out that this patchset complements MADV_WONTNEED
> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.
> 
> To achieve the goal, the patchset introduce two new options for madvise.
> One is MADV_COOL which will deactive activated pages and the other is
> MADV_COLD which will reclaim private pages instantly. These new options
> complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed when memory pressure rises.
> 
> This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> information required to make the reclaim decision is not known to the app.
> Instead, it is known to a centralized userspace daemon, and that daemon
> must be able to initiate reclaim on its own without any app involvement.
> To solve the concern, this patch introduces new syscall -
> 
>   struct pr_madvise_param {
>   int size;
>   const struct iovec *vec;
>   }
> 
>   int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
>   struct pr_madvise_param *restuls,
>   struct pr_madvise_param *ranges,
>   unsigned long flags);
> 
> The syscall get pidfd to give hints to external process and provides
> pair of result/ranges vector arguments so that it could give several
> hints to each address range all at once.
> 
> I guess others have different ideas about the naming of syscall and options
> so feel free to suggest better naming.
> 
> - Experiment
> 
> We did bunch of testing with several hundreds of real users, not artificial
> benchmark on android. We saw about 17% cold start decreasement without any
> significant battery/app startup latency issues. And with artificial benchmark
> which launches and switching apps, we saw average 7% app launching 
> improvement,
> 18% less lmkd kill and good stat from vmstat.
> 
> A is vanilla and B is process_madvise.
> 
> 
>A  B  

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Anshuman Khandual



On 05/20/2019 09:22 AM, Minchan Kim wrote:
> - Problem
> 
> Naturally, cached apps were dominant consumers of memory on the system.
> However, they were not significant consumers of swap even though they are
> good candidate for swap. Under investigation, swapping out only begins
> once the low zone watermark is hit and kswapd wakes up, but the overall
> allocation rate in the system might trip lmkd thresholds and cause a cached
> process to be killed(we measured performance swapping out vs. zapping the
> memory by killing a process. Unsurprisingly, zapping is 10x times faster
> even though we use zram which is much faster than real storage) so kill
> from lmkd will often satisfy the high zone watermark, resulting in very
> few pages actually being moved to swap.

Getting killed by lmkd which is triggered by custom system memory allocation
parameters and hence not being able to swap out is a problem ? But is not the
problem created by lmkd itself.

Or Is the objective here is reduce the number of processes which get killed by
lmkd by triggering swapping for the unused memory (user hinted) sooner so that
they dont get picked by lmkd. Under utilization for zram hardware is a concern
here as well ?

Swapping out memory into zram wont increase the latency for a hot start ? Or
is it because as it will prevent a fresh cold start which anyway will be slower
than a slow hot start. Just being curious.


[RFC 0/7] introduce memory hinting API for external process

2019-05-19 Thread Minchan Kim
- Background

The Android terminology used for forking a new process and starting an app
from scratch is a cold start, while resuming an existing app is a hot start.
While we continually try to improve the performance of cold starts, hot
starts will always be significantly less power hungry as well as faster so
we are trying to make hot start more likely than cold start.

To increase hot start, Android userspace manages the order that apps should
be killed in a process called ActivityManagerService. ActivityManagerService
tracks every Android app or service that the user could be interacting with
at any time and translates that into a ranked list for lmkd(low memory
killer daemon). They are likely to be killed by lmkd if the system has to
reclaim memory. In that sense they are similar to entries in any other cache.
Those apps are kept alive for opportunistic performance improvements but
those performance improvements will vary based on the memory requirements of
individual workloads.

- Problem

Naturally, cached apps were dominant consumers of memory on the system.
However, they were not significant consumers of swap even though they are
good candidate for swap. Under investigation, swapping out only begins
once the low zone watermark is hit and kswapd wakes up, but the overall
allocation rate in the system might trip lmkd thresholds and cause a cached
process to be killed(we measured performance swapping out vs. zapping the
memory by killing a process. Unsurprisingly, zapping is 10x times faster
even though we use zram which is much faster than real storage) so kill
from lmkd will often satisfy the high zone watermark, resulting in very
few pages actually being moved to swap.

- Approach

The approach we chose was to use a new interface to allow userspace to
proactively reclaim entire processes by leveraging platform information.
This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
that are known to be cold from userspace and to avoid races with lmkd
by reclaiming apps as soon as they entered the cached state. Additionally,
it could provide many chances for platform to use much information to
optimize memory efficiency.

IMHO we should spell it out that this patchset complements MADV_WONTNEED
and MADV_FREE by adding non-destructive ways to gain some free memory
space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
kernel that memory region is not currently needed and should be reclaimed
immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
kernel that memory region is not currently needed and should be reclaimed
when memory pressure rises.

To achieve the goal, the patchset introduce two new options for madvise.
One is MADV_COOL which will deactive activated pages and the other is
MADV_COLD which will reclaim private pages instantly. These new options
complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
that it hints the kernel that memory region is not currently needed and
should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
that it hints the kernel that memory region is not currently needed and
should be reclaimed when memory pressure rises.

This approach is similar in spirit to madvise(MADV_WONTNEED), but the
information required to make the reclaim decision is not known to the app.
Instead, it is known to a centralized userspace daemon, and that daemon
must be able to initiate reclaim on its own without any app involvement.
To solve the concern, this patch introduces new syscall -

struct pr_madvise_param {
int size;
const struct iovec *vec;
}

int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
struct pr_madvise_param *restuls,
struct pr_madvise_param *ranges,
unsigned long flags);

The syscall get pidfd to give hints to external process and provides
pair of result/ranges vector arguments so that it could give several
hints to each address range all at once.

I guess others have different ideas about the naming of syscall and options
so feel free to suggest better naming.

- Experiment

We did bunch of testing with several hundreds of real users, not artificial
benchmark on android. We saw about 17% cold start decreasement without any
significant battery/app startup latency issues. And with artificial benchmark
which launches and switching apps, we saw average 7% app launching improvement,
18% less lmkd kill and good stat from vmstat.

A is vanilla and B is process_madvise.


   A  B  delta   ratio(%)
   allocstall_dma  0  0  0   0.00
   allocstall_movable   1464457  -1007 -69.00
allocstall_normal 263210 190763 -72447