Re: [RFC 0/7] introduce memory hinting API for external process
On 05/21/2019 04:04 PM, Michal Hocko wrote: > On Tue 21-05-19 08:25:55, Anshuman Khandual wrote: >> On 05/20/2019 10:29 PM, Tim Murray wrote: > [...] >>> not seem to introduce a noticeable hot start penalty, not does it >>> cause an increase in performance problems later in the app's >>> lifecycle. I've measured with and without process_madvise, and the >>> differences are within our noise bounds. Second, because we're not >> >> That is assuming that post process_madvise() working set for the application >> is >> always smaller. There is another challenge. The external process should >> ideally >> have the knowledge of active areas of the working set for an application in >> question for it to invoke process_madvise() correctly to prevent such >> scenarios. > > But that doesn't really seem relevant for the API itself, right? The > higher level logic the monitor's business. Right. I was just wondering how the monitor would even decide what areas of the target application is active or inactive. The target application is still just an opaque entity for the monitor unless there is some sort of communication. But you are right, this not relevant to the API itself.
Re: [RFC 0/7] introduce memory hinting API for external process
On Thu, May 23, 2019 at 10:07:17PM +0900, Minchan Kim wrote: > On Wed, May 22, 2019 at 09:01:33AM -0700, Daniel Colascione wrote: > > On Wed, May 22, 2019 at 9:01 AM Christian Brauner > > wrote: > > > > > > On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote: > > > > On Wed, May 22, 2019 at 8:48 AM Christian Brauner > > > > wrote: > > > > > > > > > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote: > > > > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner > > > > > > wrote: > > > > > > > I'm not going to go into yet another long argument. I prefer > > > > > > > pidfd_*. > > > > > > > > > > > > Ok. We're each allowed our opinion. > > > > > > > > > > > > > It's tied to the api, transparent for userspace, and > > > > > > > disambiguates it > > > > > > > from process_vm_{read,write}v that both take a pid_t. > > > > > > > > > > > > Speaking of process_vm_readv and process_vm_writev: both have a > > > > > > currently-unused flags argument. Both should grow a flag that tells > > > > > > them to interpret the pid argument as a pidfd. Or do you support > > > > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why > > > > > > should process_madvise be called pidfd_madvise while > > > > > > process_vm_readv > > > > > > isn't called pidfd_vm_readv? > > > > > > > > > > Actually, you should then do the same with process_madvise() and give > > > > > it > > > > > a flag for that too if that's not too crazy. > > > > > > > > I don't know what you mean. My gut feeling is that for the sake of > > > > consistency, process_madvise, process_vm_readv, and process_vm_writev > > > > should all accept a first argument interpreted as either a numeric PID > > > > or a pidfd depending on a flag --- ideally the same flag. Is that what > > > > you have in mind? > > > > > > Yes. For the sake of consistency they should probably all default to > > > interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag > > > interpret as pidfd. > > > > Sounds good to me! > > Then, I want to change from pidfd to pid at next revsion and stick to > process_madvise as naming. Later, you guys could define PROCESS_PIDFD > flag and change all at once every process_xxx syscall friends. > > If you are faster so that I see PROCESS_PIDFD earlier, I am happy to > use it. Hi Folks, I don't want to consume a new API argument too early so want to say I will use process_madvise with pidfs argument because I agree with Daniel that we don't need to export implmentation on the syscall name. I hope every upcoming new syscall with process has by default pidfs so people are familiar with pidfd slowly so finallly they forgot pid in the long run so naturally replace pid with pidfs.
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 09:01:33AM -0700, Daniel Colascione wrote: > On Wed, May 22, 2019 at 9:01 AM Christian Brauner > wrote: > > > > On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote: > > > On Wed, May 22, 2019 at 8:48 AM Christian Brauner > > > wrote: > > > > > > > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote: > > > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner > > > > > wrote: > > > > > > I'm not going to go into yet another long argument. I prefer > > > > > > pidfd_*. > > > > > > > > > > Ok. We're each allowed our opinion. > > > > > > > > > > > It's tied to the api, transparent for userspace, and disambiguates > > > > > > it > > > > > > from process_vm_{read,write}v that both take a pid_t. > > > > > > > > > > Speaking of process_vm_readv and process_vm_writev: both have a > > > > > currently-unused flags argument. Both should grow a flag that tells > > > > > them to interpret the pid argument as a pidfd. Or do you support > > > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why > > > > > should process_madvise be called pidfd_madvise while process_vm_readv > > > > > isn't called pidfd_vm_readv? > > > > > > > > Actually, you should then do the same with process_madvise() and give it > > > > a flag for that too if that's not too crazy. > > > > > > I don't know what you mean. My gut feeling is that for the sake of > > > consistency, process_madvise, process_vm_readv, and process_vm_writev > > > should all accept a first argument interpreted as either a numeric PID > > > or a pidfd depending on a flag --- ideally the same flag. Is that what > > > you have in mind? > > > > Yes. For the sake of consistency they should probably all default to > > interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag > > interpret as pidfd. > > Sounds good to me! Then, I want to change from pidfd to pid at next revsion and stick to process_madvise as naming. Later, you guys could define PROCESS_PIDFD flag and change all at once every process_xxx syscall friends. If you are faster so that I see PROCESS_PIDFD earlier, I am happy to use it. Thanks.
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 9:01 AM Christian Brauner wrote: > > On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote: > > On Wed, May 22, 2019 at 8:48 AM Christian Brauner > > wrote: > > > > > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote: > > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner > > > > wrote: > > > > > I'm not going to go into yet another long argument. I prefer pidfd_*. > > > > > > > > Ok. We're each allowed our opinion. > > > > > > > > > It's tied to the api, transparent for userspace, and disambiguates it > > > > > from process_vm_{read,write}v that both take a pid_t. > > > > > > > > Speaking of process_vm_readv and process_vm_writev: both have a > > > > currently-unused flags argument. Both should grow a flag that tells > > > > them to interpret the pid argument as a pidfd. Or do you support > > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why > > > > should process_madvise be called pidfd_madvise while process_vm_readv > > > > isn't called pidfd_vm_readv? > > > > > > Actually, you should then do the same with process_madvise() and give it > > > a flag for that too if that's not too crazy. > > > > I don't know what you mean. My gut feeling is that for the sake of > > consistency, process_madvise, process_vm_readv, and process_vm_writev > > should all accept a first argument interpreted as either a numeric PID > > or a pidfd depending on a flag --- ideally the same flag. Is that what > > you have in mind? > > Yes. For the sake of consistency they should probably all default to > interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag > interpret as pidfd. Sounds good to me!
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 08:57:47AM -0700, Daniel Colascione wrote: > On Wed, May 22, 2019 at 8:48 AM Christian Brauner > wrote: > > > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote: > > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner > > > wrote: > > > > I'm not going to go into yet another long argument. I prefer pidfd_*. > > > > > > Ok. We're each allowed our opinion. > > > > > > > It's tied to the api, transparent for userspace, and disambiguates it > > > > from process_vm_{read,write}v that both take a pid_t. > > > > > > Speaking of process_vm_readv and process_vm_writev: both have a > > > currently-unused flags argument. Both should grow a flag that tells > > > them to interpret the pid argument as a pidfd. Or do you support > > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why > > > should process_madvise be called pidfd_madvise while process_vm_readv > > > isn't called pidfd_vm_readv? > > > > Actually, you should then do the same with process_madvise() and give it > > a flag for that too if that's not too crazy. > > I don't know what you mean. My gut feeling is that for the sake of > consistency, process_madvise, process_vm_readv, and process_vm_writev > should all accept a first argument interpreted as either a numeric PID > or a pidfd depending on a flag --- ideally the same flag. Is that what > you have in mind? Yes. For the sake of consistency they should probably all default to interpret as pid and if say PROCESS_{VM_}PIDFD is passed as flag interpret as pidfd.
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 8:48 AM Christian Brauner wrote: > > On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote: > > On Wed, May 22, 2019 at 7:52 AM Christian Brauner > > wrote: > > > I'm not going to go into yet another long argument. I prefer pidfd_*. > > > > Ok. We're each allowed our opinion. > > > > > It's tied to the api, transparent for userspace, and disambiguates it > > > from process_vm_{read,write}v that both take a pid_t. > > > > Speaking of process_vm_readv and process_vm_writev: both have a > > currently-unused flags argument. Both should grow a flag that tells > > them to interpret the pid argument as a pidfd. Or do you support > > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why > > should process_madvise be called pidfd_madvise while process_vm_readv > > isn't called pidfd_vm_readv? > > Actually, you should then do the same with process_madvise() and give it > a flag for that too if that's not too crazy. I don't know what you mean. My gut feeling is that for the sake of consistency, process_madvise, process_vm_readv, and process_vm_writev should all accept a first argument interpreted as either a numeric PID or a pidfd depending on a flag --- ideally the same flag. Is that what you have in mind?
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 08:17:23AM -0700, Daniel Colascione wrote: > On Wed, May 22, 2019 at 7:52 AM Christian Brauner > wrote: > > I'm not going to go into yet another long argument. I prefer pidfd_*. > > Ok. We're each allowed our opinion. > > > It's tied to the api, transparent for userspace, and disambiguates it > > from process_vm_{read,write}v that both take a pid_t. > > Speaking of process_vm_readv and process_vm_writev: both have a > currently-unused flags argument. Both should grow a flag that tells > them to interpret the pid argument as a pidfd. Or do you support > adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why > should process_madvise be called pidfd_madvise while process_vm_readv > isn't called pidfd_vm_readv? Actually, you should then do the same with process_madvise() and give it a flag for that too if that's not too crazy. Christian
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 7:52 AM Christian Brauner wrote: > I'm not going to go into yet another long argument. I prefer pidfd_*. Ok. We're each allowed our opinion. > It's tied to the api, transparent for userspace, and disambiguates it > from process_vm_{read,write}v that both take a pid_t. Speaking of process_vm_readv and process_vm_writev: both have a currently-unused flags argument. Both should grow a flag that tells them to interpret the pid argument as a pidfd. Or do you support adding pidfd_vm_readv and pidfd_vm_writev system calls? If not, why should process_madvise be called pidfd_madvise while process_vm_readv isn't called pidfd_vm_readv?
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 06:16:35AM -0700, Daniel Colascione wrote: > On Wed, May 22, 2019 at 1:22 AM Christian Brauner > wrote: > > > > On Wed, May 22, 2019 at 7:12 AM Daniel Colascione wrote: > > > > > > On Tue, May 21, 2019 at 4:39 AM Christian Brauner > > > wrote: > > > > > > > > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote: > > > > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > > > > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > > > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > > > > > > - Background > > > > > > > > > > > > > > > > The Android terminology used for forking a new process and > > > > > > > > starting an app > > > > > > > > from scratch is a cold start, while resuming an existing app is > > > > > > > > a hot start. > > > > > > > > While we continually try to improve the performance of cold > > > > > > > > starts, hot > > > > > > > > starts will always be significantly less power hungry as well > > > > > > > > as faster so > > > > > > > > we are trying to make hot start more likely than cold start. > > > > > > > > > > > > > > > > To increase hot start, Android userspace manages the order that > > > > > > > > apps should > > > > > > > > be killed in a process called ActivityManagerService. > > > > > > > > ActivityManagerService > > > > > > > > tracks every Android app or service that the user could be > > > > > > > > interacting with > > > > > > > > at any time and translates that into a ranked list for lmkd(low > > > > > > > > memory > > > > > > > > killer daemon). They are likely to be killed by lmkd if the > > > > > > > > system has to > > > > > > > > reclaim memory. In that sense they are similar to entries in > > > > > > > > any other cache. > > > > > > > > Those apps are kept alive for opportunistic performance > > > > > > > > improvements but > > > > > > > > those performance improvements will vary based on the memory > > > > > > > > requirements of > > > > > > > > individual workloads. > > > > > > > > > > > > > > > > - Problem > > > > > > > > > > > > > > > > Naturally, cached apps were dominant consumers of memory on the > > > > > > > > system. > > > > > > > > However, they were not significant consumers of swap even > > > > > > > > though they are > > > > > > > > good candidate for swap. Under investigation, swapping out only > > > > > > > > begins > > > > > > > > once the low zone watermark is hit and kswapd wakes up, but the > > > > > > > > overall > > > > > > > > allocation rate in the system might trip lmkd thresholds and > > > > > > > > cause a cached > > > > > > > > process to be killed(we measured performance swapping out vs. > > > > > > > > zapping the > > > > > > > > memory by killing a process. Unsurprisingly, zapping is 10x > > > > > > > > times faster > > > > > > > > even though we use zram which is much faster than real storage) > > > > > > > > so kill > > > > > > > > from lmkd will often satisfy the high zone watermark, resulting > > > > > > > > in very > > > > > > > > few pages actually being moved to swap. > > > > > > > > > > > > > > > > - Approach > > > > > > > > > > > > > > > > The approach we chose was to use a new interface to allow > > > > > > > > userspace to > > > > > > > > proactively reclaim entire processes by leveraging platform > > > > > > > > information. > > > > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs > > > > > > > > for pages > > > > > > > > that are known to be cold from userspace and to avoid races > > > > > > > > with lmkd > > > > > > > > by reclaiming apps as soon as they entered the cached state. > > > > > > > > Additionally, > > > > > > > > it could provide many chances for platform to use much > > > > > > > > information to > > > > > > > > optimize memory efficiency. > > > > > > > > > > > > > > > > IMHO we should spell it out that this patchset complements > > > > > > > > MADV_WONTNEED > > > > > > > > and MADV_FREE by adding non-destructive ways to gain some free > > > > > > > > memory > > > > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it > > > > > > > > hints the > > > > > > > > kernel that memory region is not currently needed and should be > > > > > > > > reclaimed > > > > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it > > > > > > > > hints the > > > > > > > > kernel that memory region is not currently needed and should be > > > > > > > > reclaimed > > > > > > > > when memory pressure rises. > > > > > > > > > > > > > > > > To achieve the goal, the patchset introduce two new options for > > > > > > > > madvise. > > > > > > > > One is MADV_COOL which will deactive activated pages and the > > > > > > > > other is > > > > > > > > MADV_COLD which will reclaim private pages instantly. These new > > > > > > > > options > > > > > > > > complement MADV_DONTNEED and MADV_FREE by adding > > > > > > > > non-destructive ways to > > > > > > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 1:22 AM Christian Brauner wrote: > > On Wed, May 22, 2019 at 7:12 AM Daniel Colascione wrote: > > > > On Tue, May 21, 2019 at 4:39 AM Christian Brauner > > wrote: > > > > > > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote: > > > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > > > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > > > > > - Background > > > > > > > > > > > > > > The Android terminology used for forking a new process and > > > > > > > starting an app > > > > > > > from scratch is a cold start, while resuming an existing app is a > > > > > > > hot start. > > > > > > > While we continually try to improve the performance of cold > > > > > > > starts, hot > > > > > > > starts will always be significantly less power hungry as well as > > > > > > > faster so > > > > > > > we are trying to make hot start more likely than cold start. > > > > > > > > > > > > > > To increase hot start, Android userspace manages the order that > > > > > > > apps should > > > > > > > be killed in a process called ActivityManagerService. > > > > > > > ActivityManagerService > > > > > > > tracks every Android app or service that the user could be > > > > > > > interacting with > > > > > > > at any time and translates that into a ranked list for lmkd(low > > > > > > > memory > > > > > > > killer daemon). They are likely to be killed by lmkd if the > > > > > > > system has to > > > > > > > reclaim memory. In that sense they are similar to entries in any > > > > > > > other cache. > > > > > > > Those apps are kept alive for opportunistic performance > > > > > > > improvements but > > > > > > > those performance improvements will vary based on the memory > > > > > > > requirements of > > > > > > > individual workloads. > > > > > > > > > > > > > > - Problem > > > > > > > > > > > > > > Naturally, cached apps were dominant consumers of memory on the > > > > > > > system. > > > > > > > However, they were not significant consumers of swap even though > > > > > > > they are > > > > > > > good candidate for swap. Under investigation, swapping out only > > > > > > > begins > > > > > > > once the low zone watermark is hit and kswapd wakes up, but the > > > > > > > overall > > > > > > > allocation rate in the system might trip lmkd thresholds and > > > > > > > cause a cached > > > > > > > process to be killed(we measured performance swapping out vs. > > > > > > > zapping the > > > > > > > memory by killing a process. Unsurprisingly, zapping is 10x times > > > > > > > faster > > > > > > > even though we use zram which is much faster than real storage) > > > > > > > so kill > > > > > > > from lmkd will often satisfy the high zone watermark, resulting > > > > > > > in very > > > > > > > few pages actually being moved to swap. > > > > > > > > > > > > > > - Approach > > > > > > > > > > > > > > The approach we chose was to use a new interface to allow > > > > > > > userspace to > > > > > > > proactively reclaim entire processes by leveraging platform > > > > > > > information. > > > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for > > > > > > > pages > > > > > > > that are known to be cold from userspace and to avoid races with > > > > > > > lmkd > > > > > > > by reclaiming apps as soon as they entered the cached state. > > > > > > > Additionally, > > > > > > > it could provide many chances for platform to use much > > > > > > > information to > > > > > > > optimize memory efficiency. > > > > > > > > > > > > > > IMHO we should spell it out that this patchset complements > > > > > > > MADV_WONTNEED > > > > > > > and MADV_FREE by adding non-destructive ways to gain some free > > > > > > > memory > > > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it > > > > > > > hints the > > > > > > > kernel that memory region is not currently needed and should be > > > > > > > reclaimed > > > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it > > > > > > > hints the > > > > > > > kernel that memory region is not currently needed and should be > > > > > > > reclaimed > > > > > > > when memory pressure rises. > > > > > > > > > > > > > > To achieve the goal, the patchset introduce two new options for > > > > > > > madvise. > > > > > > > One is MADV_COOL which will deactive activated pages and the > > > > > > > other is > > > > > > > MADV_COLD which will reclaim private pages instantly. These new > > > > > > > options > > > > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive > > > > > > > ways to > > > > > > > gain some free memory space. MADV_COLD is similar to > > > > > > > MADV_DONTNEED in a way > > > > > > > that it hints the kernel that memory region is not currently > > > > > > > needed and > > > > > > > should be reclaimed immediately; MADV_COOL is similar to > > > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Wed, May 22, 2019 at 7:12 AM Daniel Colascione wrote: > > On Tue, May 21, 2019 at 4:39 AM Christian Brauner > wrote: > > > > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote: > > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > > > > - Background > > > > > > > > > > > > The Android terminology used for forking a new process and starting > > > > > > an app > > > > > > from scratch is a cold start, while resuming an existing app is a > > > > > > hot start. > > > > > > While we continually try to improve the performance of cold starts, > > > > > > hot > > > > > > starts will always be significantly less power hungry as well as > > > > > > faster so > > > > > > we are trying to make hot start more likely than cold start. > > > > > > > > > > > > To increase hot start, Android userspace manages the order that > > > > > > apps should > > > > > > be killed in a process called ActivityManagerService. > > > > > > ActivityManagerService > > > > > > tracks every Android app or service that the user could be > > > > > > interacting with > > > > > > at any time and translates that into a ranked list for lmkd(low > > > > > > memory > > > > > > killer daemon). They are likely to be killed by lmkd if the system > > > > > > has to > > > > > > reclaim memory. In that sense they are similar to entries in any > > > > > > other cache. > > > > > > Those apps are kept alive for opportunistic performance > > > > > > improvements but > > > > > > those performance improvements will vary based on the memory > > > > > > requirements of > > > > > > individual workloads. > > > > > > > > > > > > - Problem > > > > > > > > > > > > Naturally, cached apps were dominant consumers of memory on the > > > > > > system. > > > > > > However, they were not significant consumers of swap even though > > > > > > they are > > > > > > good candidate for swap. Under investigation, swapping out only > > > > > > begins > > > > > > once the low zone watermark is hit and kswapd wakes up, but the > > > > > > overall > > > > > > allocation rate in the system might trip lmkd thresholds and cause > > > > > > a cached > > > > > > process to be killed(we measured performance swapping out vs. > > > > > > zapping the > > > > > > memory by killing a process. Unsurprisingly, zapping is 10x times > > > > > > faster > > > > > > even though we use zram which is much faster than real storage) so > > > > > > kill > > > > > > from lmkd will often satisfy the high zone watermark, resulting in > > > > > > very > > > > > > few pages actually being moved to swap. > > > > > > > > > > > > - Approach > > > > > > > > > > > > The approach we chose was to use a new interface to allow userspace > > > > > > to > > > > > > proactively reclaim entire processes by leveraging platform > > > > > > information. > > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for > > > > > > pages > > > > > > that are known to be cold from userspace and to avoid races with > > > > > > lmkd > > > > > > by reclaiming apps as soon as they entered the cached state. > > > > > > Additionally, > > > > > > it could provide many chances for platform to use much information > > > > > > to > > > > > > optimize memory efficiency. > > > > > > > > > > > > IMHO we should spell it out that this patchset complements > > > > > > MADV_WONTNEED > > > > > > and MADV_FREE by adding non-destructive ways to gain some free > > > > > > memory > > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints > > > > > > the > > > > > > kernel that memory region is not currently needed and should be > > > > > > reclaimed > > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it > > > > > > hints the > > > > > > kernel that memory region is not currently needed and should be > > > > > > reclaimed > > > > > > when memory pressure rises. > > > > > > > > > > > > To achieve the goal, the patchset introduce two new options for > > > > > > madvise. > > > > > > One is MADV_COOL which will deactive activated pages and the other > > > > > > is > > > > > > MADV_COLD which will reclaim private pages instantly. These new > > > > > > options > > > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive > > > > > > ways to > > > > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED > > > > > > in a way > > > > > > that it hints the kernel that memory region is not currently needed > > > > > > and > > > > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE > > > > > > in a way > > > > > > that it hints the kernel that memory region is not currently needed > > > > > > and > > > > > > should be reclaimed when memory pressure rises. > > > > > > > > > > > > This approach is similar in spirit to madvise(MADV_WONTNEED),
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 4:39 AM Christian Brauner wrote: > > On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote: > > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > > > - Background > > > > > > > > > > The Android terminology used for forking a new process and starting > > > > > an app > > > > > from scratch is a cold start, while resuming an existing app is a hot > > > > > start. > > > > > While we continually try to improve the performance of cold starts, > > > > > hot > > > > > starts will always be significantly less power hungry as well as > > > > > faster so > > > > > we are trying to make hot start more likely than cold start. > > > > > > > > > > To increase hot start, Android userspace manages the order that apps > > > > > should > > > > > be killed in a process called ActivityManagerService. > > > > > ActivityManagerService > > > > > tracks every Android app or service that the user could be > > > > > interacting with > > > > > at any time and translates that into a ranked list for lmkd(low memory > > > > > killer daemon). They are likely to be killed by lmkd if the system > > > > > has to > > > > > reclaim memory. In that sense they are similar to entries in any > > > > > other cache. > > > > > Those apps are kept alive for opportunistic performance improvements > > > > > but > > > > > those performance improvements will vary based on the memory > > > > > requirements of > > > > > individual workloads. > > > > > > > > > > - Problem > > > > > > > > > > Naturally, cached apps were dominant consumers of memory on the > > > > > system. > > > > > However, they were not significant consumers of swap even though they > > > > > are > > > > > good candidate for swap. Under investigation, swapping out only begins > > > > > once the low zone watermark is hit and kswapd wakes up, but the > > > > > overall > > > > > allocation rate in the system might trip lmkd thresholds and cause a > > > > > cached > > > > > process to be killed(we measured performance swapping out vs. zapping > > > > > the > > > > > memory by killing a process. Unsurprisingly, zapping is 10x times > > > > > faster > > > > > even though we use zram which is much faster than real storage) so > > > > > kill > > > > > from lmkd will often satisfy the high zone watermark, resulting in > > > > > very > > > > > few pages actually being moved to swap. > > > > > > > > > > - Approach > > > > > > > > > > The approach we chose was to use a new interface to allow userspace to > > > > > proactively reclaim entire processes by leveraging platform > > > > > information. > > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for > > > > > pages > > > > > that are known to be cold from userspace and to avoid races with lmkd > > > > > by reclaiming apps as soon as they entered the cached state. > > > > > Additionally, > > > > > it could provide many chances for platform to use much information to > > > > > optimize memory efficiency. > > > > > > > > > > IMHO we should spell it out that this patchset complements > > > > > MADV_WONTNEED > > > > > and MADV_FREE by adding non-destructive ways to gain some free memory > > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints > > > > > the > > > > > kernel that memory region is not currently needed and should be > > > > > reclaimed > > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints > > > > > the > > > > > kernel that memory region is not currently needed and should be > > > > > reclaimed > > > > > when memory pressure rises. > > > > > > > > > > To achieve the goal, the patchset introduce two new options for > > > > > madvise. > > > > > One is MADV_COOL which will deactive activated pages and the other is > > > > > MADV_COLD which will reclaim private pages instantly. These new > > > > > options > > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways > > > > > to > > > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in > > > > > a way > > > > > that it hints the kernel that memory region is not currently needed > > > > > and > > > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in > > > > > a way > > > > > that it hints the kernel that memory region is not currently needed > > > > > and > > > > > should be reclaimed when memory pressure rises. > > > > > > > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > > > > information required to make the reclaim decision is not known to the > > > > > app. > > > > > Instead, it is known to a centralized userspace daemon, and that > > > > > daemon > > > > > must be able to initiate reclaim on its own without any app > > > > > involvement. > > > > > To solve the concern, this patch introduces new syscall
Re: [RFC 0/7] introduce memory hinting API for external process
To expand on the ChromeOS use case we're in a very similar situation to Android. For example, the Chrome browser uses a separate process for each individual tab (with some exceptions) and over time many tabs remain open in a back-grounded or idle state. Given that we have a lot of information about the weight of a tab, when it was last active, etc, we can benefit tremendously from per-process reclaim. We're working on getting real world numbers but all of our initial testing shows very promising results. On Tue, May 21, 2019 at 5:57 AM Shakeel Butt wrote: > > On Mon, May 20, 2019 at 7:55 PM Anshuman Khandual > wrote: > > > > > > > > On 05/20/2019 10:29 PM, Tim Murray wrote: > > > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual > > > wrote: > > >> > > >> Or Is the objective here is reduce the number of processes which get > > >> killed by > > >> lmkd by triggering swapping for the unused memory (user hinted) sooner > > >> so that > > >> they dont get picked by lmkd. Under utilization for zram hardware is a > > >> concern > > >> here as well ? > > > > > > The objective is to avoid some instances of memory pressure by > > > proactively swapping pages that userspace knows to be cold before > > > those pages reach the end of the LRUs, which in turn can prevent some > > > apps from being killed by lmk/lmkd. As soon as Android userspace knows > > > that an application is not being used and is only resident to improve > > > performance if the user returns to that app, we can kick off > > > process_madvise on that process's pages (or some portion of those > > > pages) in a power-efficient way to reduce memory pressure long before > > > the system hits the free page watermark. This allows the system more > > > time to put pages into zram versus waiting for the watermark to > > > trigger kswapd, which decreases the likelihood that later memory > > > allocations will cause enough pressure to trigger a kill of one of > > > these apps. > > > > So this opens up bit of LRU management to user space hints. Also because > > the app > > in itself wont know about the memory situation of the entire system, new > > system > > call needs to be called from an external process. > > > > > > > >> Swapping out memory into zram wont increase the latency for a hot start > > >> ? Or > > >> is it because as it will prevent a fresh cold start which anyway will be > > >> slower > > >> than a slow hot start. Just being curious. > > > > > > First, not all swapped pages will be reloaded immediately once an app > > > is resumed. We've found that an app's working set post-process_madvise > > > is significantly smaller than what an app allocates when it first > > > launches (see the delta between pswpin and pswpout in Minchan's > > > results). Presumably because of this, faulting to fetch from zram does > > > > pswpin 4176131392647 975034 233.00 > > pswpout127422426617311387507 108.00 > > > > IIUC the swap-in ratio is way higher in comparison to that of swap out. Is > > that > > always the case ? Or it tend to swap out from an active area of the working > > set > > which faulted back again. > > > > > not seem to introduce a noticeable hot start penalty, not does it > > > cause an increase in performance problems later in the app's > > > lifecycle. I've measured with and without process_madvise, and the > > > differences are within our noise bounds. Second, because we're not > > > > That is assuming that post process_madvise() working set for the > > application is > > always smaller. There is another challenge. The external process should > > ideally > > have the knowledge of active areas of the working set for an application in > > question for it to invoke process_madvise() correctly to prevent such > > scenarios. > > > > > preemptively evicting file pages and only making them more likely to > > > be evicted when there's already memory pressure, we avoid the case > > > where we process_madvise an app then immediately return to the app and > > > reload all file pages in the working set even though there was no > > > intervening memory pressure. Our initial version of this work evicted > > > > That would be the worst case scenario which should be avoided. Memory > > pressure > > must be a parameter before actually doing the swap out. But pages if know > > to be > > inactive/cold can be marked high priority to be swapped out. > > > > > file pages preemptively and did cause a noticeable slowdown (~15%) for > > > that case; this patch set avoids that slowdown. Finally, the benefit > > > from avoiding cold starts is huge. The performance improvement from > > > having a hot start instead of a cold start ranges from 3x for very > > > small apps to 50x+ for larger apps like high-fidelity games. > > > > Is there any other real world scenario apart from this app based ecosystem > > where > > user hinted LRU management might be helpful ? Just being curious. Thanks > > for the > > detailed
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 7:55 PM Anshuman Khandual wrote: > > > > On 05/20/2019 10:29 PM, Tim Murray wrote: > > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual > > wrote: > >> > >> Or Is the objective here is reduce the number of processes which get > >> killed by > >> lmkd by triggering swapping for the unused memory (user hinted) sooner so > >> that > >> they dont get picked by lmkd. Under utilization for zram hardware is a > >> concern > >> here as well ? > > > > The objective is to avoid some instances of memory pressure by > > proactively swapping pages that userspace knows to be cold before > > those pages reach the end of the LRUs, which in turn can prevent some > > apps from being killed by lmk/lmkd. As soon as Android userspace knows > > that an application is not being used and is only resident to improve > > performance if the user returns to that app, we can kick off > > process_madvise on that process's pages (or some portion of those > > pages) in a power-efficient way to reduce memory pressure long before > > the system hits the free page watermark. This allows the system more > > time to put pages into zram versus waiting for the watermark to > > trigger kswapd, which decreases the likelihood that later memory > > allocations will cause enough pressure to trigger a kill of one of > > these apps. > > So this opens up bit of LRU management to user space hints. Also because the > app > in itself wont know about the memory situation of the entire system, new > system > call needs to be called from an external process. > > > > >> Swapping out memory into zram wont increase the latency for a hot start ? > >> Or > >> is it because as it will prevent a fresh cold start which anyway will be > >> slower > >> than a slow hot start. Just being curious. > > > > First, not all swapped pages will be reloaded immediately once an app > > is resumed. We've found that an app's working set post-process_madvise > > is significantly smaller than what an app allocates when it first > > launches (see the delta between pswpin and pswpout in Minchan's > > results). Presumably because of this, faulting to fetch from zram does > > pswpin 4176131392647 975034 233.00 > pswpout127422426617311387507 108.00 > > IIUC the swap-in ratio is way higher in comparison to that of swap out. Is > that > always the case ? Or it tend to swap out from an active area of the working > set > which faulted back again. > > > not seem to introduce a noticeable hot start penalty, not does it > > cause an increase in performance problems later in the app's > > lifecycle. I've measured with and without process_madvise, and the > > differences are within our noise bounds. Second, because we're not > > That is assuming that post process_madvise() working set for the application > is > always smaller. There is another challenge. The external process should > ideally > have the knowledge of active areas of the working set for an application in > question for it to invoke process_madvise() correctly to prevent such > scenarios. > > > preemptively evicting file pages and only making them more likely to > > be evicted when there's already memory pressure, we avoid the case > > where we process_madvise an app then immediately return to the app and > > reload all file pages in the working set even though there was no > > intervening memory pressure. Our initial version of this work evicted > > That would be the worst case scenario which should be avoided. Memory pressure > must be a parameter before actually doing the swap out. But pages if know to > be > inactive/cold can be marked high priority to be swapped out. > > > file pages preemptively and did cause a noticeable slowdown (~15%) for > > that case; this patch set avoids that slowdown. Finally, the benefit > > from avoiding cold starts is huge. The performance improvement from > > having a hot start instead of a cold start ranges from 3x for very > > small apps to 50x+ for larger apps like high-fidelity games. > > Is there any other real world scenario apart from this app based ecosystem > where > user hinted LRU management might be helpful ? Just being curious. Thanks for > the > detailed explanation. I will continue looking into this series. Chrome OS is another real world use-case for this user hinted LRU management approach by proactively reclaiming reclaim from tabs not accessed by the user for some time.
Re: [RFC 0/7] introduce memory hinting API for external process
On Sun, May 19, 2019 at 8:53 PM Minchan Kim wrote: > > - Background > > The Android terminology used for forking a new process and starting an app > from scratch is a cold start, while resuming an existing app is a hot start. > While we continually try to improve the performance of cold starts, hot > starts will always be significantly less power hungry as well as faster so > we are trying to make hot start more likely than cold start. > > To increase hot start, Android userspace manages the order that apps should > be killed in a process called ActivityManagerService. ActivityManagerService > tracks every Android app or service that the user could be interacting with > at any time and translates that into a ranked list for lmkd(low memory > killer daemon). They are likely to be killed by lmkd if the system has to > reclaim memory. In that sense they are similar to entries in any other cache. > Those apps are kept alive for opportunistic performance improvements but > those performance improvements will vary based on the memory requirements of > individual workloads. > > - Problem > > Naturally, cached apps were dominant consumers of memory on the system. > However, they were not significant consumers of swap even though they are > good candidate for swap. Under investigation, swapping out only begins > once the low zone watermark is hit and kswapd wakes up, but the overall > allocation rate in the system might trip lmkd thresholds and cause a cached > process to be killed(we measured performance swapping out vs. zapping the > memory by killing a process. Unsurprisingly, zapping is 10x times faster > even though we use zram which is much faster than real storage) so kill > from lmkd will often satisfy the high zone watermark, resulting in very > few pages actually being moved to swap. It is not clear what exactly is the problem from the above para. IMO low usage of swap is not the problem but rather global memory pressure and the reactive response to it is the problem. Killing apps over swap is preferred as you have noted zapping frees memory faster but it indirectly increases cold start. Also swapping on allocation causes latency issues for the app. So, a proactive mechanism is needed to keep global pressure away and indirectly reduces cold starts and alloc stalls. > > - Approach > > The approach we chose was to use a new interface to allow userspace to > proactively reclaim entire processes by leveraging platform information. > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > that are known to be cold from userspace and to avoid races with lmkd > by reclaiming apps as soon as they entered the cached state. Additionally, > it could provide many chances for platform to use much information to > optimize memory efficiency. I think it would be good to have clear reasoning on why "reclaim from userspace" approach is taken. Android runtime clearly has more accurate stale/cold information at the app/process level and can positively influence kernel's reclaim decisions. So, "reclaim from userspace" approach makes total sense for Android. I envision that Chrome OS would be another very obvious user of this approach. There can be tens of tabs which the user have not touched for sometime. Chrome OS can proactively reclaim memory from such tabs. > > IMHO we should spell it out that this patchset complements MADV_WONTNEED MADV_DONTNEED? same at couple of places below. > and MADV_FREE by adding non-destructive ways to gain some free memory > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > when memory pressure rises. > > To achieve the goal, the patchset introduce two new options for madvise. > One is MADV_COOL which will deactive activated pages and the other is > MADV_COLD which will reclaim private pages instantly. These new options > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed when memory pressure rises. > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > information required to make the reclaim decision is not known to the app. > Instead, it is known to a centralized userspace daemon, and that daemon > must be able to initiate reclaim on its own without any app involvement. > To solve the concern, this patch introduces new syscall - > > struct pr_madvise_param { > int size; > const struct iovec *vec; > } > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 02:04:00PM +0200, Christian Brauner wrote: > On May 21, 2019 1:41:20 PM GMT+02:00, Minchan Kim wrote: > >On Tue, May 21, 2019 at 01:30:32PM +0200, Christian Brauner wrote: > >> On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > >> > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > >> > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > >> > > > - Background > >> > > > > >> > > > The Android terminology used for forking a new process and > >starting an app > >> > > > from scratch is a cold start, while resuming an existing app is > >a hot start. > >> > > > While we continually try to improve the performance of cold > >starts, hot > >> > > > starts will always be significantly less power hungry as well > >as faster so > >> > > > we are trying to make hot start more likely than cold start. > >> > > > > >> > > > To increase hot start, Android userspace manages the order that > >apps should > >> > > > be killed in a process called ActivityManagerService. > >ActivityManagerService > >> > > > tracks every Android app or service that the user could be > >interacting with > >> > > > at any time and translates that into a ranked list for lmkd(low > >memory > >> > > > killer daemon). They are likely to be killed by lmkd if the > >system has to > >> > > > reclaim memory. In that sense they are similar to entries in > >any other cache. > >> > > > Those apps are kept alive for opportunistic performance > >improvements but > >> > > > those performance improvements will vary based on the memory > >requirements of > >> > > > individual workloads. > >> > > > > >> > > > - Problem > >> > > > > >> > > > Naturally, cached apps were dominant consumers of memory on the > >system. > >> > > > However, they were not significant consumers of swap even > >though they are > >> > > > good candidate for swap. Under investigation, swapping out only > >begins > >> > > > once the low zone watermark is hit and kswapd wakes up, but the > >overall > >> > > > allocation rate in the system might trip lmkd thresholds and > >cause a cached > >> > > > process to be killed(we measured performance swapping out vs. > >zapping the > >> > > > memory by killing a process. Unsurprisingly, zapping is 10x > >times faster > >> > > > even though we use zram which is much faster than real storage) > >so kill > >> > > > from lmkd will often satisfy the high zone watermark, resulting > >in very > >> > > > few pages actually being moved to swap. > >> > > > > >> > > > - Approach > >> > > > > >> > > > The approach we chose was to use a new interface to allow > >userspace to > >> > > > proactively reclaim entire processes by leveraging platform > >information. > >> > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs > >for pages > >> > > > that are known to be cold from userspace and to avoid races > >with lmkd > >> > > > by reclaiming apps as soon as they entered the cached state. > >Additionally, > >> > > > it could provide many chances for platform to use much > >information to > >> > > > optimize memory efficiency. > >> > > > > >> > > > IMHO we should spell it out that this patchset complements > >MADV_WONTNEED > >> > > > and MADV_FREE by adding non-destructive ways to gain some free > >memory > >> > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it > >hints the > >> > > > kernel that memory region is not currently needed and should be > >reclaimed > >> > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it > >hints the > >> > > > kernel that memory region is not currently needed and should be > >reclaimed > >> > > > when memory pressure rises. > >> > > > > >> > > > To achieve the goal, the patchset introduce two new options for > >madvise. > >> > > > One is MADV_COOL which will deactive activated pages and the > >other is > >> > > > MADV_COLD which will reclaim private pages instantly. These new > >options > >> > > > complement MADV_DONTNEED and MADV_FREE by adding > >non-destructive ways to > >> > > > gain some free memory space. MADV_COLD is similar to > >MADV_DONTNEED in a way > >> > > > that it hints the kernel that memory region is not currently > >needed and > >> > > > should be reclaimed immediately; MADV_COOL is similar to > >MADV_FREE in a way > >> > > > that it hints the kernel that memory region is not currently > >needed and > >> > > > should be reclaimed when memory pressure rises. > >> > > > > >> > > > This approach is similar in spirit to madvise(MADV_WONTNEED), > >but the > >> > > > information required to make the reclaim decision is not known > >to the app. > >> > > > Instead, it is known to a centralized userspace daemon, and > >that daemon > >> > > > must be able to initiate reclaim on its own without any app > >involvement. > >> > > > To solve the concern, this patch introduces new syscall - > >> > > > > >> > > > struct pr_madvise_param { > >> > > > int size; > >> > > > const struct
Re: [RFC 0/7] introduce memory hinting API for external process
On May 21, 2019 1:41:20 PM GMT+02:00, Minchan Kim wrote: >On Tue, May 21, 2019 at 01:30:32PM +0200, Christian Brauner wrote: >> On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: >> > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: >> > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: >> > > > - Background >> > > > >> > > > The Android terminology used for forking a new process and >starting an app >> > > > from scratch is a cold start, while resuming an existing app is >a hot start. >> > > > While we continually try to improve the performance of cold >starts, hot >> > > > starts will always be significantly less power hungry as well >as faster so >> > > > we are trying to make hot start more likely than cold start. >> > > > >> > > > To increase hot start, Android userspace manages the order that >apps should >> > > > be killed in a process called ActivityManagerService. >ActivityManagerService >> > > > tracks every Android app or service that the user could be >interacting with >> > > > at any time and translates that into a ranked list for lmkd(low >memory >> > > > killer daemon). They are likely to be killed by lmkd if the >system has to >> > > > reclaim memory. In that sense they are similar to entries in >any other cache. >> > > > Those apps are kept alive for opportunistic performance >improvements but >> > > > those performance improvements will vary based on the memory >requirements of >> > > > individual workloads. >> > > > >> > > > - Problem >> > > > >> > > > Naturally, cached apps were dominant consumers of memory on the >system. >> > > > However, they were not significant consumers of swap even >though they are >> > > > good candidate for swap. Under investigation, swapping out only >begins >> > > > once the low zone watermark is hit and kswapd wakes up, but the >overall >> > > > allocation rate in the system might trip lmkd thresholds and >cause a cached >> > > > process to be killed(we measured performance swapping out vs. >zapping the >> > > > memory by killing a process. Unsurprisingly, zapping is 10x >times faster >> > > > even though we use zram which is much faster than real storage) >so kill >> > > > from lmkd will often satisfy the high zone watermark, resulting >in very >> > > > few pages actually being moved to swap. >> > > > >> > > > - Approach >> > > > >> > > > The approach we chose was to use a new interface to allow >userspace to >> > > > proactively reclaim entire processes by leveraging platform >information. >> > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs >for pages >> > > > that are known to be cold from userspace and to avoid races >with lmkd >> > > > by reclaiming apps as soon as they entered the cached state. >Additionally, >> > > > it could provide many chances for platform to use much >information to >> > > > optimize memory efficiency. >> > > > >> > > > IMHO we should spell it out that this patchset complements >MADV_WONTNEED >> > > > and MADV_FREE by adding non-destructive ways to gain some free >memory >> > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it >hints the >> > > > kernel that memory region is not currently needed and should be >reclaimed >> > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it >hints the >> > > > kernel that memory region is not currently needed and should be >reclaimed >> > > > when memory pressure rises. >> > > > >> > > > To achieve the goal, the patchset introduce two new options for >madvise. >> > > > One is MADV_COOL which will deactive activated pages and the >other is >> > > > MADV_COLD which will reclaim private pages instantly. These new >options >> > > > complement MADV_DONTNEED and MADV_FREE by adding >non-destructive ways to >> > > > gain some free memory space. MADV_COLD is similar to >MADV_DONTNEED in a way >> > > > that it hints the kernel that memory region is not currently >needed and >> > > > should be reclaimed immediately; MADV_COOL is similar to >MADV_FREE in a way >> > > > that it hints the kernel that memory region is not currently >needed and >> > > > should be reclaimed when memory pressure rises. >> > > > >> > > > This approach is similar in spirit to madvise(MADV_WONTNEED), >but the >> > > > information required to make the reclaim decision is not known >to the app. >> > > > Instead, it is known to a centralized userspace daemon, and >that daemon >> > > > must be able to initiate reclaim on its own without any app >involvement. >> > > > To solve the concern, this patch introduces new syscall - >> > > > >> > > >struct pr_madvise_param { >> > > >int size; >> > > >const struct iovec *vec; >> > > >} >> > > > >> > > >int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, >> > > >struct pr_madvise_param *restuls, >> > > >struct pr_madvise_param *ranges, >> > > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 01:30:32PM +0200, Christian Brauner wrote: > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > > - Background > > > > > > > > The Android terminology used for forking a new process and starting an > > > > app > > > > from scratch is a cold start, while resuming an existing app is a hot > > > > start. > > > > While we continually try to improve the performance of cold starts, hot > > > > starts will always be significantly less power hungry as well as faster > > > > so > > > > we are trying to make hot start more likely than cold start. > > > > > > > > To increase hot start, Android userspace manages the order that apps > > > > should > > > > be killed in a process called ActivityManagerService. > > > > ActivityManagerService > > > > tracks every Android app or service that the user could be interacting > > > > with > > > > at any time and translates that into a ranked list for lmkd(low memory > > > > killer daemon). They are likely to be killed by lmkd if the system has > > > > to > > > > reclaim memory. In that sense they are similar to entries in any other > > > > cache. > > > > Those apps are kept alive for opportunistic performance improvements but > > > > those performance improvements will vary based on the memory > > > > requirements of > > > > individual workloads. > > > > > > > > - Problem > > > > > > > > Naturally, cached apps were dominant consumers of memory on the system. > > > > However, they were not significant consumers of swap even though they > > > > are > > > > good candidate for swap. Under investigation, swapping out only begins > > > > once the low zone watermark is hit and kswapd wakes up, but the overall > > > > allocation rate in the system might trip lmkd thresholds and cause a > > > > cached > > > > process to be killed(we measured performance swapping out vs. zapping > > > > the > > > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > > > even though we use zram which is much faster than real storage) so kill > > > > from lmkd will often satisfy the high zone watermark, resulting in very > > > > few pages actually being moved to swap. > > > > > > > > - Approach > > > > > > > > The approach we chose was to use a new interface to allow userspace to > > > > proactively reclaim entire processes by leveraging platform information. > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > > > that are known to be cold from userspace and to avoid races with lmkd > > > > by reclaiming apps as soon as they entered the cached state. > > > > Additionally, > > > > it could provide many chances for platform to use much information to > > > > optimize memory efficiency. > > > > > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > > > and MADV_FREE by adding non-destructive ways to gain some free memory > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > > > kernel that memory region is not currently needed and should be > > > > reclaimed > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints > > > > the > > > > kernel that memory region is not currently needed and should be > > > > reclaimed > > > > when memory pressure rises. > > > > > > > > To achieve the goal, the patchset introduce two new options for madvise. > > > > One is MADV_COOL which will deactive activated pages and the other is > > > > MADV_COLD which will reclaim private pages instantly. These new options > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a > > > > way > > > > that it hints the kernel that memory region is not currently needed and > > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a > > > > way > > > > that it hints the kernel that memory region is not currently needed and > > > > should be reclaimed when memory pressure rises. > > > > > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > > > information required to make the reclaim decision is not known to the > > > > app. > > > > Instead, it is known to a centralized userspace daemon, and that daemon > > > > must be able to initiate reclaim on its own without any app involvement. > > > > To solve the concern, this patch introduces new syscall - > > > > > > > > struct pr_madvise_param { > > > > int size; > > > > const struct iovec *vec; > > > > } > > > > > > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > > > > struct pr_madvise_param *restuls, > > > > struct pr_madvise_param *ranges, > > > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 01:30:29PM +0200, Christian Brauner wrote: > On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > > - Background > > > > > > > > The Android terminology used for forking a new process and starting an > > > > app > > > > from scratch is a cold start, while resuming an existing app is a hot > > > > start. > > > > While we continually try to improve the performance of cold starts, hot > > > > starts will always be significantly less power hungry as well as faster > > > > so > > > > we are trying to make hot start more likely than cold start. > > > > > > > > To increase hot start, Android userspace manages the order that apps > > > > should > > > > be killed in a process called ActivityManagerService. > > > > ActivityManagerService > > > > tracks every Android app or service that the user could be interacting > > > > with > > > > at any time and translates that into a ranked list for lmkd(low memory > > > > killer daemon). They are likely to be killed by lmkd if the system has > > > > to > > > > reclaim memory. In that sense they are similar to entries in any other > > > > cache. > > > > Those apps are kept alive for opportunistic performance improvements but > > > > those performance improvements will vary based on the memory > > > > requirements of > > > > individual workloads. > > > > > > > > - Problem > > > > > > > > Naturally, cached apps were dominant consumers of memory on the system. > > > > However, they were not significant consumers of swap even though they > > > > are > > > > good candidate for swap. Under investigation, swapping out only begins > > > > once the low zone watermark is hit and kswapd wakes up, but the overall > > > > allocation rate in the system might trip lmkd thresholds and cause a > > > > cached > > > > process to be killed(we measured performance swapping out vs. zapping > > > > the > > > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > > > even though we use zram which is much faster than real storage) so kill > > > > from lmkd will often satisfy the high zone watermark, resulting in very > > > > few pages actually being moved to swap. > > > > > > > > - Approach > > > > > > > > The approach we chose was to use a new interface to allow userspace to > > > > proactively reclaim entire processes by leveraging platform information. > > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > > > that are known to be cold from userspace and to avoid races with lmkd > > > > by reclaiming apps as soon as they entered the cached state. > > > > Additionally, > > > > it could provide many chances for platform to use much information to > > > > optimize memory efficiency. > > > > > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > > > and MADV_FREE by adding non-destructive ways to gain some free memory > > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > > > kernel that memory region is not currently needed and should be > > > > reclaimed > > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints > > > > the > > > > kernel that memory region is not currently needed and should be > > > > reclaimed > > > > when memory pressure rises. > > > > > > > > To achieve the goal, the patchset introduce two new options for madvise. > > > > One is MADV_COOL which will deactive activated pages and the other is > > > > MADV_COLD which will reclaim private pages instantly. These new options > > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a > > > > way > > > > that it hints the kernel that memory region is not currently needed and > > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a > > > > way > > > > that it hints the kernel that memory region is not currently needed and > > > > should be reclaimed when memory pressure rises. > > > > > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > > > information required to make the reclaim decision is not known to the > > > > app. > > > > Instead, it is known to a centralized userspace daemon, and that daemon > > > > must be able to initiate reclaim on its own without any app involvement. > > > > To solve the concern, this patch introduces new syscall - > > > > > > > > struct pr_madvise_param { > > > > int size; > > > > const struct iovec *vec; > > > > } > > > > > > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > > > > struct pr_madvise_param *restuls, > > > > struct pr_madvise_param *ranges, > > > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 08:05:52PM +0900, Minchan Kim wrote: > On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > - Background > > > > > > The Android terminology used for forking a new process and starting an app > > > from scratch is a cold start, while resuming an existing app is a hot > > > start. > > > While we continually try to improve the performance of cold starts, hot > > > starts will always be significantly less power hungry as well as faster so > > > we are trying to make hot start more likely than cold start. > > > > > > To increase hot start, Android userspace manages the order that apps > > > should > > > be killed in a process called ActivityManagerService. > > > ActivityManagerService > > > tracks every Android app or service that the user could be interacting > > > with > > > at any time and translates that into a ranked list for lmkd(low memory > > > killer daemon). They are likely to be killed by lmkd if the system has to > > > reclaim memory. In that sense they are similar to entries in any other > > > cache. > > > Those apps are kept alive for opportunistic performance improvements but > > > those performance improvements will vary based on the memory requirements > > > of > > > individual workloads. > > > > > > - Problem > > > > > > Naturally, cached apps were dominant consumers of memory on the system. > > > However, they were not significant consumers of swap even though they are > > > good candidate for swap. Under investigation, swapping out only begins > > > once the low zone watermark is hit and kswapd wakes up, but the overall > > > allocation rate in the system might trip lmkd thresholds and cause a > > > cached > > > process to be killed(we measured performance swapping out vs. zapping the > > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > > even though we use zram which is much faster than real storage) so kill > > > from lmkd will often satisfy the high zone watermark, resulting in very > > > few pages actually being moved to swap. > > > > > > - Approach > > > > > > The approach we chose was to use a new interface to allow userspace to > > > proactively reclaim entire processes by leveraging platform information. > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > > that are known to be cold from userspace and to avoid races with lmkd > > > by reclaiming apps as soon as they entered the cached state. Additionally, > > > it could provide many chances for platform to use much information to > > > optimize memory efficiency. > > > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > > and MADV_FREE by adding non-destructive ways to gain some free memory > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > > kernel that memory region is not currently needed and should be reclaimed > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > > kernel that memory region is not currently needed and should be reclaimed > > > when memory pressure rises. > > > > > > To achieve the goal, the patchset introduce two new options for madvise. > > > One is MADV_COOL which will deactive activated pages and the other is > > > MADV_COLD which will reclaim private pages instantly. These new options > > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a > > > way > > > that it hints the kernel that memory region is not currently needed and > > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a > > > way > > > that it hints the kernel that memory region is not currently needed and > > > should be reclaimed when memory pressure rises. > > > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > > information required to make the reclaim decision is not known to the app. > > > Instead, it is known to a centralized userspace daemon, and that daemon > > > must be able to initiate reclaim on its own without any app involvement. > > > To solve the concern, this patch introduces new syscall - > > > > > > struct pr_madvise_param { > > > int size; > > > const struct iovec *vec; > > > } > > > > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > > > struct pr_madvise_param *restuls, > > > struct pr_madvise_param *ranges, > > > unsigned long flags); > > > > > > The syscall get pidfd to give hints to external process and provides > > > pair of result/ranges vector arguments so that it could give several > > > hints to each address range all at once. > > > > > > I guess others have different ideas about the naming of syscall and > > > options > > > so feel free to suggest better naming. > >
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 10:42:00AM +0200, Christian Brauner wrote: > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > - Background > > > > The Android terminology used for forking a new process and starting an app > > from scratch is a cold start, while resuming an existing app is a hot start. > > While we continually try to improve the performance of cold starts, hot > > starts will always be significantly less power hungry as well as faster so > > we are trying to make hot start more likely than cold start. > > > > To increase hot start, Android userspace manages the order that apps should > > be killed in a process called ActivityManagerService. ActivityManagerService > > tracks every Android app or service that the user could be interacting with > > at any time and translates that into a ranked list for lmkd(low memory > > killer daemon). They are likely to be killed by lmkd if the system has to > > reclaim memory. In that sense they are similar to entries in any other > > cache. > > Those apps are kept alive for opportunistic performance improvements but > > those performance improvements will vary based on the memory requirements of > > individual workloads. > > > > - Problem > > > > Naturally, cached apps were dominant consumers of memory on the system. > > However, they were not significant consumers of swap even though they are > > good candidate for swap. Under investigation, swapping out only begins > > once the low zone watermark is hit and kswapd wakes up, but the overall > > allocation rate in the system might trip lmkd thresholds and cause a cached > > process to be killed(we measured performance swapping out vs. zapping the > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > even though we use zram which is much faster than real storage) so kill > > from lmkd will often satisfy the high zone watermark, resulting in very > > few pages actually being moved to swap. > > > > - Approach > > > > The approach we chose was to use a new interface to allow userspace to > > proactively reclaim entire processes by leveraging platform information. > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > that are known to be cold from userspace and to avoid races with lmkd > > by reclaiming apps as soon as they entered the cached state. Additionally, > > it could provide many chances for platform to use much information to > > optimize memory efficiency. > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > > > To achieve the goal, the patchset introduce two new options for madvise. > > One is MADV_COOL which will deactive activated pages and the other is > > MADV_COLD which will reclaim private pages instantly. These new options > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > > that it hints the kernel that memory region is not currently needed and > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > > that it hints the kernel that memory region is not currently needed and > > should be reclaimed when memory pressure rises. > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > information required to make the reclaim decision is not known to the app. > > Instead, it is known to a centralized userspace daemon, and that daemon > > must be able to initiate reclaim on its own without any app involvement. > > To solve the concern, this patch introduces new syscall - > > > > struct pr_madvise_param { > > int size; > > const struct iovec *vec; > > } > > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > > struct pr_madvise_param *restuls, > > struct pr_madvise_param *ranges, > > unsigned long flags); > > > > The syscall get pidfd to give hints to external process and provides > > pair of result/ranges vector arguments so that it could give several > > hints to each address range all at once. > > > > I guess others have different ideas about the naming of syscall and options > > so feel free to suggest better naming. > > Yes, all new syscalls making use of pidfds should be named > pidfd_. So please make this pidfd_madvise. I don't have any particular preference but just wondering why pidfd is so special to have it as prefix of system call name. > > Please make sure to Cc me on this in
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue 21-05-19 08:25:55, Anshuman Khandual wrote: > On 05/20/2019 10:29 PM, Tim Murray wrote: [...] > > not seem to introduce a noticeable hot start penalty, not does it > > cause an increase in performance problems later in the app's > > lifecycle. I've measured with and without process_madvise, and the > > differences are within our noise bounds. Second, because we're not > > That is assuming that post process_madvise() working set for the application > is > always smaller. There is another challenge. The external process should > ideally > have the knowledge of active areas of the working set for an application in > question for it to invoke process_madvise() correctly to prevent such > scenarios. But that doesn't really seem relevant for the API itself, right? The higher level logic the monitor's business. -- Michal Hocko SUSE Labs
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > - Background > > The Android terminology used for forking a new process and starting an app > from scratch is a cold start, while resuming an existing app is a hot start. > While we continually try to improve the performance of cold starts, hot > starts will always be significantly less power hungry as well as faster so > we are trying to make hot start more likely than cold start. > > To increase hot start, Android userspace manages the order that apps should > be killed in a process called ActivityManagerService. ActivityManagerService > tracks every Android app or service that the user could be interacting with > at any time and translates that into a ranked list for lmkd(low memory > killer daemon). They are likely to be killed by lmkd if the system has to > reclaim memory. In that sense they are similar to entries in any other cache. > Those apps are kept alive for opportunistic performance improvements but > those performance improvements will vary based on the memory requirements of > individual workloads. > > - Problem > > Naturally, cached apps were dominant consumers of memory on the system. > However, they were not significant consumers of swap even though they are > good candidate for swap. Under investigation, swapping out only begins > once the low zone watermark is hit and kswapd wakes up, but the overall > allocation rate in the system might trip lmkd thresholds and cause a cached > process to be killed(we measured performance swapping out vs. zapping the > memory by killing a process. Unsurprisingly, zapping is 10x times faster > even though we use zram which is much faster than real storage) so kill > from lmkd will often satisfy the high zone watermark, resulting in very > few pages actually being moved to swap. > > - Approach > > The approach we chose was to use a new interface to allow userspace to > proactively reclaim entire processes by leveraging platform information. > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > that are known to be cold from userspace and to avoid races with lmkd > by reclaiming apps as soon as they entered the cached state. Additionally, > it could provide many chances for platform to use much information to > optimize memory efficiency. > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > and MADV_FREE by adding non-destructive ways to gain some free memory > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > when memory pressure rises. > > To achieve the goal, the patchset introduce two new options for madvise. > One is MADV_COOL which will deactive activated pages and the other is > MADV_COLD which will reclaim private pages instantly. These new options > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed when memory pressure rises. > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > information required to make the reclaim decision is not known to the app. > Instead, it is known to a centralized userspace daemon, and that daemon > must be able to initiate reclaim on its own without any app involvement. > To solve the concern, this patch introduces new syscall - > > struct pr_madvise_param { > int size; > const struct iovec *vec; > } > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > struct pr_madvise_param *restuls, > struct pr_madvise_param *ranges, > unsigned long flags); > > The syscall get pidfd to give hints to external process and provides > pair of result/ranges vector arguments so that it could give several > hints to each address range all at once. > > I guess others have different ideas about the naming of syscall and options > so feel free to suggest better naming. Yes, all new syscalls making use of pidfds should be named pidfd_. So please make this pidfd_madvise. Please make sure to Cc me on this in the future as I'm maintaining pidfds. Would be great to have Jann on this too since he's been touching both mm and parts of the pidfd stuff with me.
Re: [RFC 0/7] introduce memory hinting API for external process
[linux-api] On Mon 20-05-19 18:44:52, Matthew Wilcox wrote: > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > Do we tear down page tables for these ranges? That seems like a good > way of reclaiming potentially a substantial amount of memory. I do not think we can in general because this is a non-destructive operation. So at least we cannot tear down anonymous ptes (they will turn into swap entries). -- Michal Hocko SUSE Labs
Re: [RFC 0/7] introduce memory hinting API for external process
[Cc linux-api] On Tue 21-05-19 13:39:50, Minchan Kim wrote: > On Mon, May 20, 2019 at 12:46:05PM -0400, Johannes Weiner wrote: > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > > - Approach > > > > > > The approach we chose was to use a new interface to allow userspace to > > > proactively reclaim entire processes by leveraging platform information. > > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > > that are known to be cold from userspace and to avoid races with lmkd > > > by reclaiming apps as soon as they entered the cached state. Additionally, > > > it could provide many chances for platform to use much information to > > > optimize memory efficiency. > > > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > > and MADV_FREE by adding non-destructive ways to gain some free memory > > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > > kernel that memory region is not currently needed and should be reclaimed > > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > > kernel that memory region is not currently needed and should be reclaimed > > > when memory pressure rises. > > > > I agree with this approach and the semantics. But these names are very > > vague and extremely easy to confuse since they're so similar. > > > > MADV_COLD could be a good name, but for deactivating pages, not > > reclaiming them - marking memory "cold" on the LRU for later reclaim. > > > > For the immediate reclaim one, I think there is a better option too: > > In virtual memory speak, putting a page into secondary storage (or > > ensuring it's already there), and then freeing its in-memory copy, is > > called "paging out". And that's what this flag is supposed to do. So > > how about MADV_PAGEOUT? > > > > With that, we'd have: > > > > MADV_FREE: Mark data invalid, free memory when needed > > MADV_DONTNEED: Mark data invalid, free memory immediately > > > > MADV_COLD: Data is not used for a while, free memory when needed > > MADV_PAGEOUT: Data is not used for a while, free memory immediately > > > > What do you think? > > There are several suggestions until now. Thanks, Folks! > > For deactivating: > > - MADV_COOL > - MADV_RECLAIM_LAZY > - MADV_DEACTIVATE > - MADV_COLD > - MADV_FREE_PRESERVE > > > For reclaiming: > > - MADV_COLD > - MADV_RECLAIM_NOW > - MADV_RECLAIMING > - MADV_PAGEOUT > - MADV_DONTNEED_PRESERVE > > It seems everybody doesn't like MADV_COLD so want to go with other. > For consisteny of view with other existing hints of madvise, -preserve > postfix suits well. However, originally, I don't like the naming FREE > vs DONTNEED from the beginning. They were easily confused. > I prefer PAGEOUT to RECLAIM since it's more likely to be nuance to > represent reclaim with memory pressure and is supposed to paged-in > if someone need it later. So, it imply PRESERVE. > If there is not strong against it, I want to go with MADV_COLD and > MADV_PAGEOUT. > > Other opinion? I do not really care strongly. I am pretty sure we will have a lot of suggestions because people tend to be good at arguing about that... Anyway, unlike DONTNEED/FREE we do not have any other OS to implement these features, right? So we shouldn't be tight to existing names. On the other hand I kinda like the reference to the existing names but DEACTIVATE/PAGEOUT seem a good fit to me as well. Unless there is way much better name suggested I would go with one of those. Up to you. -- Michal Hocko SUSE Labs
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 08:25:55AM +0530, Anshuman Khandual wrote: > > > On 05/20/2019 10:29 PM, Tim Murray wrote: > > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual > > wrote: > >> > >> Or Is the objective here is reduce the number of processes which get > >> killed by > >> lmkd by triggering swapping for the unused memory (user hinted) sooner so > >> that > >> they dont get picked by lmkd. Under utilization for zram hardware is a > >> concern > >> here as well ? > > > > The objective is to avoid some instances of memory pressure by > > proactively swapping pages that userspace knows to be cold before > > those pages reach the end of the LRUs, which in turn can prevent some > > apps from being killed by lmk/lmkd. As soon as Android userspace knows > > that an application is not being used and is only resident to improve > > performance if the user returns to that app, we can kick off > > process_madvise on that process's pages (or some portion of those > > pages) in a power-efficient way to reduce memory pressure long before > > the system hits the free page watermark. This allows the system more > > time to put pages into zram versus waiting for the watermark to > > trigger kswapd, which decreases the likelihood that later memory > > allocations will cause enough pressure to trigger a kill of one of > > these apps. > > So this opens up bit of LRU management to user space hints. Also because the > app > in itself wont know about the memory situation of the entire system, new > system > call needs to be called from an external process. That's why process_madvise is introduced here. > > > > >> Swapping out memory into zram wont increase the latency for a hot start ? > >> Or > >> is it because as it will prevent a fresh cold start which anyway will be > >> slower > >> than a slow hot start. Just being curious. > > > > First, not all swapped pages will be reloaded immediately once an app > > is resumed. We've found that an app's working set post-process_madvise > > is significantly smaller than what an app allocates when it first > > launches (see the delta between pswpin and pswpout in Minchan's > > results). Presumably because of this, faulting to fetch from zram does > > pswpin 4176131392647 975034 233.00 > pswpout127422426617311387507 108.00 > > IIUC the swap-in ratio is way higher in comparison to that of swap out. Is > that > always the case ? Or it tend to swap out from an active area of the working > set > which faulted back again. I think it's because apps are alive longer via reducing being killed so turn into from pgpgin to swapin. > > > not seem to introduce a noticeable hot start penalty, not does it > > cause an increase in performance problems later in the app's > > lifecycle. I've measured with and without process_madvise, and the > > differences are within our noise bounds. Second, because we're not > > That is assuming that post process_madvise() working set for the application > is > always smaller. There is another challenge. The external process should > ideally > have the knowledge of active areas of the working set for an application in > question for it to invoke process_madvise() correctly to prevent such > scenarios. There are several ways to detect workingset more accurately at the cost of runtime. For example, with idle page tracking or clear_refs. Accuracy is always trade-off of overhead for LRU aging. > > > preemptively evicting file pages and only making them more likely to > > be evicted when there's already memory pressure, we avoid the case > > where we process_madvise an app then immediately return to the app and > > reload all file pages in the working set even though there was no > > intervening memory pressure. Our initial version of this work evicted > > That would be the worst case scenario which should be avoided. Memory pressure > must be a parameter before actually doing the swap out. But pages if know to > be > inactive/cold can be marked high priority to be swapped out. > > > file pages preemptively and did cause a noticeable slowdown (~15%) for > > that case; this patch set avoids that slowdown. Finally, the benefit > > from avoiding cold starts is huge. The performance improvement from > > having a hot start instead of a cold start ranges from 3x for very > > small apps to 50x+ for larger apps like high-fidelity games. > > Is there any other real world scenario apart from this app based ecosystem > where > user hinted LRU management might be helpful ? Just being curious. Thanks for > the > detailed explanation. I will continue looking into this series.
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 06:44:52PM -0700, Matthew Wilcox wrote: > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > Do we tear down page tables for these ranges? That seems like a good True for MADV_COLD(reclaiming) but false for MADV_COOL(deactivating) at this implementation. > way of reclaiming potentially a substantial amount of memory. Given that consider refauting are spread out over time and reclaim occurs in burst, that does make sense to speed up the reclaiming. However, a concern to me is anonymous pages since they need swap cache insertion, which would be wasteful if they are not reclaimed, finally.
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 12:46:05PM -0400, Johannes Weiner wrote: > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > - Approach > > > > The approach we chose was to use a new interface to allow userspace to > > proactively reclaim entire processes by leveraging platform information. > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > that are known to be cold from userspace and to avoid races with lmkd > > by reclaiming apps as soon as they entered the cached state. Additionally, > > it could provide many chances for platform to use much information to > > optimize memory efficiency. > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > I agree with this approach and the semantics. But these names are very > vague and extremely easy to confuse since they're so similar. > > MADV_COLD could be a good name, but for deactivating pages, not > reclaiming them - marking memory "cold" on the LRU for later reclaim. > > For the immediate reclaim one, I think there is a better option too: > In virtual memory speak, putting a page into secondary storage (or > ensuring it's already there), and then freeing its in-memory copy, is > called "paging out". And that's what this flag is supposed to do. So > how about MADV_PAGEOUT? > > With that, we'd have: > > MADV_FREE: Mark data invalid, free memory when needed > MADV_DONTNEED: Mark data invalid, free memory immediately > > MADV_COLD: Data is not used for a while, free memory when needed > MADV_PAGEOUT: Data is not used for a while, free memory immediately > > What do you think? There are several suggestions until now. Thanks, Folks! For deactivating: - MADV_COOL - MADV_RECLAIM_LAZY - MADV_DEACTIVATE - MADV_COLD - MADV_FREE_PRESERVE For reclaiming: - MADV_COLD - MADV_RECLAIM_NOW - MADV_RECLAIMING - MADV_PAGEOUT - MADV_DONTNEED_PRESERVE It seems everybody doesn't like MADV_COLD so want to go with other. For consisteny of view with other existing hints of madvise, -preserve postfix suits well. However, originally, I don't like the naming FREE vs DONTNEED from the beginning. They were easily confused. I prefer PAGEOUT to RECLAIM since it's more likely to be nuance to represent reclaim with memory pressure and is supposed to paged-in if someone need it later. So, it imply PRESERVE. If there is not strong against it, I want to go with MADV_COLD and MADV_PAGEOUT. Other opinion?
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 04:42:00PM +0200, Oleksandr Natalenko wrote: > Hi. > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > - Background > > > > The Android terminology used for forking a new process and starting an app > > from scratch is a cold start, while resuming an existing app is a hot start. > > While we continually try to improve the performance of cold starts, hot > > starts will always be significantly less power hungry as well as faster so > > we are trying to make hot start more likely than cold start. > > > > To increase hot start, Android userspace manages the order that apps should > > be killed in a process called ActivityManagerService. ActivityManagerService > > tracks every Android app or service that the user could be interacting with > > at any time and translates that into a ranked list for lmkd(low memory > > killer daemon). They are likely to be killed by lmkd if the system has to > > reclaim memory. In that sense they are similar to entries in any other > > cache. > > Those apps are kept alive for opportunistic performance improvements but > > those performance improvements will vary based on the memory requirements of > > individual workloads. > > > > - Problem > > > > Naturally, cached apps were dominant consumers of memory on the system. > > However, they were not significant consumers of swap even though they are > > good candidate for swap. Under investigation, swapping out only begins > > once the low zone watermark is hit and kswapd wakes up, but the overall > > allocation rate in the system might trip lmkd thresholds and cause a cached > > process to be killed(we measured performance swapping out vs. zapping the > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > even though we use zram which is much faster than real storage) so kill > > from lmkd will often satisfy the high zone watermark, resulting in very > > few pages actually being moved to swap. > > > > - Approach > > > > The approach we chose was to use a new interface to allow userspace to > > proactively reclaim entire processes by leveraging platform information. > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > that are known to be cold from userspace and to avoid races with lmkd > > by reclaiming apps as soon as they entered the cached state. Additionally, > > it could provide many chances for platform to use much information to > > optimize memory efficiency. > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > > > To achieve the goal, the patchset introduce two new options for madvise. > > One is MADV_COOL which will deactive activated pages and the other is > > MADV_COLD which will reclaim private pages instantly. These new options > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > > that it hints the kernel that memory region is not currently needed and > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > > that it hints the kernel that memory region is not currently needed and > > should be reclaimed when memory pressure rises. > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > information required to make the reclaim decision is not known to the app. > > Instead, it is known to a centralized userspace daemon, and that daemon > > must be able to initiate reclaim on its own without any app involvement. > > To solve the concern, this patch introduces new syscall - > > > > struct pr_madvise_param { > > int size; > > const struct iovec *vec; > > } > > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > > struct pr_madvise_param *restuls, > > struct pr_madvise_param *ranges, > > unsigned long flags); > > > > The syscall get pidfd to give hints to external process and provides > > pair of result/ranges vector arguments so that it could give several > > hints to each address range all at once. > > > > I guess others have different ideas about the naming of syscall and options > > so feel free to suggest better naming. > > > > - Experiment > > > > We did bunch of testing with several hundreds of real users, not artificial > > benchmark on android. We saw about 17% cold start decreasement without any > > significant battery/app startup latency issues. And with artificial > >
Re: [RFC 0/7] introduce memory hinting API for external process
On 05/20/2019 10:29 PM, Tim Murray wrote: > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual > wrote: >> >> Or Is the objective here is reduce the number of processes which get killed >> by >> lmkd by triggering swapping for the unused memory (user hinted) sooner so >> that >> they dont get picked by lmkd. Under utilization for zram hardware is a >> concern >> here as well ? > > The objective is to avoid some instances of memory pressure by > proactively swapping pages that userspace knows to be cold before > those pages reach the end of the LRUs, which in turn can prevent some > apps from being killed by lmk/lmkd. As soon as Android userspace knows > that an application is not being used and is only resident to improve > performance if the user returns to that app, we can kick off > process_madvise on that process's pages (or some portion of those > pages) in a power-efficient way to reduce memory pressure long before > the system hits the free page watermark. This allows the system more > time to put pages into zram versus waiting for the watermark to > trigger kswapd, which decreases the likelihood that later memory > allocations will cause enough pressure to trigger a kill of one of > these apps. So this opens up bit of LRU management to user space hints. Also because the app in itself wont know about the memory situation of the entire system, new system call needs to be called from an external process. > >> Swapping out memory into zram wont increase the latency for a hot start ? Or >> is it because as it will prevent a fresh cold start which anyway will be >> slower >> than a slow hot start. Just being curious. > > First, not all swapped pages will be reloaded immediately once an app > is resumed. We've found that an app's working set post-process_madvise > is significantly smaller than what an app allocates when it first > launches (see the delta between pswpin and pswpout in Minchan's > results). Presumably because of this, faulting to fetch from zram does pswpin 4176131392647 975034 233.00 pswpout127422426617311387507 108.00 IIUC the swap-in ratio is way higher in comparison to that of swap out. Is that always the case ? Or it tend to swap out from an active area of the working set which faulted back again. > not seem to introduce a noticeable hot start penalty, not does it > cause an increase in performance problems later in the app's > lifecycle. I've measured with and without process_madvise, and the > differences are within our noise bounds. Second, because we're not That is assuming that post process_madvise() working set for the application is always smaller. There is another challenge. The external process should ideally have the knowledge of active areas of the working set for an application in question for it to invoke process_madvise() correctly to prevent such scenarios. > preemptively evicting file pages and only making them more likely to > be evicted when there's already memory pressure, we avoid the case > where we process_madvise an app then immediately return to the app and > reload all file pages in the working set even though there was no > intervening memory pressure. Our initial version of this work evicted That would be the worst case scenario which should be avoided. Memory pressure must be a parameter before actually doing the swap out. But pages if know to be inactive/cold can be marked high priority to be swapped out. > file pages preemptively and did cause a noticeable slowdown (~15%) for > that case; this patch set avoids that slowdown. Finally, the benefit > from avoiding cold starts is huge. The performance improvement from > having a hot start instead of a cold start ranges from 3x for very > small apps to 50x+ for larger apps like high-fidelity games. Is there any other real world scenario apart from this app based ecosystem where user hinted LRU management might be helpful ? Just being curious. Thanks for the detailed explanation. I will continue looking into this series.
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > IMHO we should spell it out that this patchset complements MADV_WONTNEED > and MADV_FREE by adding non-destructive ways to gain some free memory > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > when memory pressure rises. Do we tear down page tables for these ranges? That seems like a good way of reclaiming potentially a substantial amount of memory.
Re: [RFC 0/7] introduce memory hinting API for external process
On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual wrote: > > Or Is the objective here is reduce the number of processes which get killed by > lmkd by triggering swapping for the unused memory (user hinted) sooner so that > they dont get picked by lmkd. Under utilization for zram hardware is a concern > here as well ? The objective is to avoid some instances of memory pressure by proactively swapping pages that userspace knows to be cold before those pages reach the end of the LRUs, which in turn can prevent some apps from being killed by lmk/lmkd. As soon as Android userspace knows that an application is not being used and is only resident to improve performance if the user returns to that app, we can kick off process_madvise on that process's pages (or some portion of those pages) in a power-efficient way to reduce memory pressure long before the system hits the free page watermark. This allows the system more time to put pages into zram versus waiting for the watermark to trigger kswapd, which decreases the likelihood that later memory allocations will cause enough pressure to trigger a kill of one of these apps. > Swapping out memory into zram wont increase the latency for a hot start ? Or > is it because as it will prevent a fresh cold start which anyway will be > slower > than a slow hot start. Just being curious. First, not all swapped pages will be reloaded immediately once an app is resumed. We've found that an app's working set post-process_madvise is significantly smaller than what an app allocates when it first launches (see the delta between pswpin and pswpout in Minchan's results). Presumably because of this, faulting to fetch from zram does not seem to introduce a noticeable hot start penalty, not does it cause an increase in performance problems later in the app's lifecycle. I've measured with and without process_madvise, and the differences are within our noise bounds. Second, because we're not preemptively evicting file pages and only making them more likely to be evicted when there's already memory pressure, we avoid the case where we process_madvise an app then immediately return to the app and reload all file pages in the working set even though there was no intervening memory pressure. Our initial version of this work evicted file pages preemptively and did cause a noticeable slowdown (~15%) for that case; this patch set avoids that slowdown. Finally, the benefit from avoiding cold starts is huge. The performance improvement from having a hot start instead of a cold start ranges from 3x for very small apps to 50x+ for larger apps like high-fidelity games.
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > - Approach > > The approach we chose was to use a new interface to allow userspace to > proactively reclaim entire processes by leveraging platform information. > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > that are known to be cold from userspace and to avoid races with lmkd > by reclaiming apps as soon as they entered the cached state. Additionally, > it could provide many chances for platform to use much information to > optimize memory efficiency. > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > and MADV_FREE by adding non-destructive ways to gain some free memory > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > when memory pressure rises. I agree with this approach and the semantics. But these names are very vague and extremely easy to confuse since they're so similar. MADV_COLD could be a good name, but for deactivating pages, not reclaiming them - marking memory "cold" on the LRU for later reclaim. For the immediate reclaim one, I think there is a better option too: In virtual memory speak, putting a page into secondary storage (or ensuring it's already there), and then freeing its in-memory copy, is called "paging out". And that's what this flag is supposed to do. So how about MADV_PAGEOUT? With that, we'd have: MADV_FREE: Mark data invalid, free memory when needed MADV_DONTNEED: Mark data invalid, free memory immediately MADV_COLD: Data is not used for a while, free memory when needed MADV_PAGEOUT: Data is not used for a while, free memory immediately What do you think?
Re: [RFC 0/7] introduce memory hinting API for external process
Hi. On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > - Background > > The Android terminology used for forking a new process and starting an app > from scratch is a cold start, while resuming an existing app is a hot start. > While we continually try to improve the performance of cold starts, hot > starts will always be significantly less power hungry as well as faster so > we are trying to make hot start more likely than cold start. > > To increase hot start, Android userspace manages the order that apps should > be killed in a process called ActivityManagerService. ActivityManagerService > tracks every Android app or service that the user could be interacting with > at any time and translates that into a ranked list for lmkd(low memory > killer daemon). They are likely to be killed by lmkd if the system has to > reclaim memory. In that sense they are similar to entries in any other cache. > Those apps are kept alive for opportunistic performance improvements but > those performance improvements will vary based on the memory requirements of > individual workloads. > > - Problem > > Naturally, cached apps were dominant consumers of memory on the system. > However, they were not significant consumers of swap even though they are > good candidate for swap. Under investigation, swapping out only begins > once the low zone watermark is hit and kswapd wakes up, but the overall > allocation rate in the system might trip lmkd thresholds and cause a cached > process to be killed(we measured performance swapping out vs. zapping the > memory by killing a process. Unsurprisingly, zapping is 10x times faster > even though we use zram which is much faster than real storage) so kill > from lmkd will often satisfy the high zone watermark, resulting in very > few pages actually being moved to swap. > > - Approach > > The approach we chose was to use a new interface to allow userspace to > proactively reclaim entire processes by leveraging platform information. > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > that are known to be cold from userspace and to avoid races with lmkd > by reclaiming apps as soon as they entered the cached state. Additionally, > it could provide many chances for platform to use much information to > optimize memory efficiency. > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > and MADV_FREE by adding non-destructive ways to gain some free memory > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > when memory pressure rises. > > To achieve the goal, the patchset introduce two new options for madvise. > One is MADV_COOL which will deactive activated pages and the other is > MADV_COLD which will reclaim private pages instantly. These new options > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed when memory pressure rises. > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > information required to make the reclaim decision is not known to the app. > Instead, it is known to a centralized userspace daemon, and that daemon > must be able to initiate reclaim on its own without any app involvement. > To solve the concern, this patch introduces new syscall - > > struct pr_madvise_param { > int size; > const struct iovec *vec; > } > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > struct pr_madvise_param *restuls, > struct pr_madvise_param *ranges, > unsigned long flags); > > The syscall get pidfd to give hints to external process and provides > pair of result/ranges vector arguments so that it could give several > hints to each address range all at once. > > I guess others have different ideas about the naming of syscall and options > so feel free to suggest better naming. > > - Experiment > > We did bunch of testing with several hundreds of real users, not artificial > benchmark on android. We saw about 17% cold start decreasement without any > significant battery/app startup latency issues. And with artificial benchmark > which launches and switching apps, we saw average 7% app launching > improvement, > 18% less lmkd kill and good stat from vmstat. > > A is vanilla and B is process_madvise. > > >A B
Re: [RFC 0/7] introduce memory hinting API for external process
[Cc linux-api] On Mon 20-05-19 12:52:47, Minchan Kim wrote: > - Background > > The Android terminology used for forking a new process and starting an app > from scratch is a cold start, while resuming an existing app is a hot start. > While we continually try to improve the performance of cold starts, hot > starts will always be significantly less power hungry as well as faster so > we are trying to make hot start more likely than cold start. > > To increase hot start, Android userspace manages the order that apps should > be killed in a process called ActivityManagerService. ActivityManagerService > tracks every Android app or service that the user could be interacting with > at any time and translates that into a ranked list for lmkd(low memory > killer daemon). They are likely to be killed by lmkd if the system has to > reclaim memory. In that sense they are similar to entries in any other cache. > Those apps are kept alive for opportunistic performance improvements but > those performance improvements will vary based on the memory requirements of > individual workloads. > > - Problem > > Naturally, cached apps were dominant consumers of memory on the system. > However, they were not significant consumers of swap even though they are > good candidate for swap. Under investigation, swapping out only begins > once the low zone watermark is hit and kswapd wakes up, but the overall > allocation rate in the system might trip lmkd thresholds and cause a cached > process to be killed(we measured performance swapping out vs. zapping the > memory by killing a process. Unsurprisingly, zapping is 10x times faster > even though we use zram which is much faster than real storage) so kill > from lmkd will often satisfy the high zone watermark, resulting in very > few pages actually being moved to swap. > > - Approach > > The approach we chose was to use a new interface to allow userspace to > proactively reclaim entire processes by leveraging platform information. > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > that are known to be cold from userspace and to avoid races with lmkd > by reclaiming apps as soon as they entered the cached state. Additionally, > it could provide many chances for platform to use much information to > optimize memory efficiency. > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > and MADV_FREE by adding non-destructive ways to gain some free memory > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > kernel that memory region is not currently needed and should be reclaimed > when memory pressure rises. > > To achieve the goal, the patchset introduce two new options for madvise. > One is MADV_COOL which will deactive activated pages and the other is > MADV_COLD which will reclaim private pages instantly. These new options > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > that it hints the kernel that memory region is not currently needed and > should be reclaimed when memory pressure rises. > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > information required to make the reclaim decision is not known to the app. > Instead, it is known to a centralized userspace daemon, and that daemon > must be able to initiate reclaim on its own without any app involvement. > To solve the concern, this patch introduces new syscall - > > struct pr_madvise_param { > int size; > const struct iovec *vec; > } > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > struct pr_madvise_param *restuls, > struct pr_madvise_param *ranges, > unsigned long flags); > > The syscall get pidfd to give hints to external process and provides > pair of result/ranges vector arguments so that it could give several > hints to each address range all at once. > > I guess others have different ideas about the naming of syscall and options > so feel free to suggest better naming. > > - Experiment > > We did bunch of testing with several hundreds of real users, not artificial > benchmark on android. We saw about 17% cold start decreasement without any > significant battery/app startup latency issues. And with artificial benchmark > which launches and switching apps, we saw average 7% app launching > improvement, > 18% less lmkd kill and good stat from vmstat. > > A is vanilla and B is process_madvise. > > >A B
Re: [RFC 0/7] introduce memory hinting API for external process
On 05/20/2019 09:22 AM, Minchan Kim wrote: > - Problem > > Naturally, cached apps were dominant consumers of memory on the system. > However, they were not significant consumers of swap even though they are > good candidate for swap. Under investigation, swapping out only begins > once the low zone watermark is hit and kswapd wakes up, but the overall > allocation rate in the system might trip lmkd thresholds and cause a cached > process to be killed(we measured performance swapping out vs. zapping the > memory by killing a process. Unsurprisingly, zapping is 10x times faster > even though we use zram which is much faster than real storage) so kill > from lmkd will often satisfy the high zone watermark, resulting in very > few pages actually being moved to swap. Getting killed by lmkd which is triggered by custom system memory allocation parameters and hence not being able to swap out is a problem ? But is not the problem created by lmkd itself. Or Is the objective here is reduce the number of processes which get killed by lmkd by triggering swapping for the unused memory (user hinted) sooner so that they dont get picked by lmkd. Under utilization for zram hardware is a concern here as well ? Swapping out memory into zram wont increase the latency for a hot start ? Or is it because as it will prevent a fresh cold start which anyway will be slower than a slow hot start. Just being curious.
[RFC 0/7] introduce memory hinting API for external process
- Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. IMHO we should spell it out that this patchset complements MADV_WONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COOL which will deactive activated pages and the other is MADV_COLD which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This approach is similar in spirit to madvise(MADV_WONTNEED), but the information required to make the reclaim decision is not known to the app. Instead, it is known to a centralized userspace daemon, and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the concern, this patch introduces new syscall - struct pr_madvise_param { int size; const struct iovec *vec; } int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, struct pr_madvise_param *restuls, struct pr_madvise_param *ranges, unsigned long flags); The syscall get pidfd to give hints to external process and provides pair of result/ranges vector arguments so that it could give several hints to each address range all at once. I guess others have different ideas about the naming of syscall and options so feel free to suggest better naming. - Experiment We did bunch of testing with several hundreds of real users, not artificial benchmark on android. We saw about 17% cold start decreasement without any significant battery/app startup latency issues. And with artificial benchmark which launches and switching apps, we saw average 7% app launching improvement, 18% less lmkd kill and good stat from vmstat. A is vanilla and B is process_madvise. A B delta ratio(%) allocstall_dma 0 0 0 0.00 allocstall_movable 1464457 -1007 -69.00 allocstall_normal 263210 190763 -72447