Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Greg KH
On Thu, Mar 09, 2017 at 11:00:07AM +0100, Michal Hocko wrote:
> On Thu 09-03-17 10:30:28, Greg KH wrote:
> > On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> > > Greg, do you see any obstacle to have this merged. The discussion so far
> > > shown that a) vendors are not using the code as is b) there seems to be
> > > an agreement that something else than we have in the kernel is really
> > > needed.
> > 
> > Well, some vendors are using the code as-is, just not Sony...
> > 
> > I think the ideas that Tim wrote about is the best way forward for this.
> > I'd prefer to leave the code in the kernel until that solution is
> > integrated, as dropping support entirely isn't very nice.
> > 
> > But, given that almost no Android system is running mainline at the
> > moment, I will queue this patch up for 4.12-rc1, which will give the
> > Google people a bit more of an incentive to get their solution
> > implemented and working and merged :)
> > 
> > Sound reasonable?
> 
> sounds good to me.

Oh nevermind, might as well commit it now to my -next branch.  All now
deleted :)

greg k-h


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Greg KH
On Thu, Mar 09, 2017 at 11:00:07AM +0100, Michal Hocko wrote:
> On Thu 09-03-17 10:30:28, Greg KH wrote:
> > On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> > > Greg, do you see any obstacle to have this merged. The discussion so far
> > > shown that a) vendors are not using the code as is b) there seems to be
> > > an agreement that something else than we have in the kernel is really
> > > needed.
> > 
> > Well, some vendors are using the code as-is, just not Sony...
> > 
> > I think the ideas that Tim wrote about is the best way forward for this.
> > I'd prefer to leave the code in the kernel until that solution is
> > integrated, as dropping support entirely isn't very nice.
> > 
> > But, given that almost no Android system is running mainline at the
> > moment, I will queue this patch up for 4.12-rc1, which will give the
> > Google people a bit more of an incentive to get their solution
> > implemented and working and merged :)
> > 
> > Sound reasonable?
> 
> sounds good to me.

Oh nevermind, might as well commit it now to my -next branch.  All now
deleted :)

greg k-h


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Michal Hocko
On Thu 09-03-17 10:30:28, Greg KH wrote:
> On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> > Greg, do you see any obstacle to have this merged. The discussion so far
> > shown that a) vendors are not using the code as is b) there seems to be
> > an agreement that something else than we have in the kernel is really
> > needed.
> 
> Well, some vendors are using the code as-is, just not Sony...
> 
> I think the ideas that Tim wrote about is the best way forward for this.
> I'd prefer to leave the code in the kernel until that solution is
> integrated, as dropping support entirely isn't very nice.
> 
> But, given that almost no Android system is running mainline at the
> moment, I will queue this patch up for 4.12-rc1, which will give the
> Google people a bit more of an incentive to get their solution
> implemented and working and merged :)
> 
> Sound reasonable?

sounds good to me.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Michal Hocko
On Thu 09-03-17 10:30:28, Greg KH wrote:
> On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> > Greg, do you see any obstacle to have this merged. The discussion so far
> > shown that a) vendors are not using the code as is b) there seems to be
> > an agreement that something else than we have in the kernel is really
> > needed.
> 
> Well, some vendors are using the code as-is, just not Sony...
> 
> I think the ideas that Tim wrote about is the best way forward for this.
> I'd prefer to leave the code in the kernel until that solution is
> integrated, as dropping support entirely isn't very nice.
> 
> But, given that almost no Android system is running mainline at the
> moment, I will queue this patch up for 4.12-rc1, which will give the
> Google people a bit more of an incentive to get their solution
> implemented and working and merged :)
> 
> Sound reasonable?

sounds good to me.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Greg KH
On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> Greg, do you see any obstacle to have this merged. The discussion so far
> shown that a) vendors are not using the code as is b) there seems to be
> an agreement that something else than we have in the kernel is really
> needed.

Well, some vendors are using the code as-is, just not Sony...

I think the ideas that Tim wrote about is the best way forward for this.
I'd prefer to leave the code in the kernel until that solution is
integrated, as dropping support entirely isn't very nice.

But, given that almost no Android system is running mainline at the
moment, I will queue this patch up for 4.12-rc1, which will give the
Google people a bit more of an incentive to get their solution
implemented and working and merged :)

Sound reasonable?  I haven't started to go through my patch queue for
4.12-rc1 stuff just yet, still digging through it for 4.11-final things
at the moment.  Give me a week or so to catch up.

thanks,

greg k-h


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Greg KH
On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> Greg, do you see any obstacle to have this merged. The discussion so far
> shown that a) vendors are not using the code as is b) there seems to be
> an agreement that something else than we have in the kernel is really
> needed.

Well, some vendors are using the code as-is, just not Sony...

I think the ideas that Tim wrote about is the best way forward for this.
I'd prefer to leave the code in the kernel until that solution is
integrated, as dropping support entirely isn't very nice.

But, given that almost no Android system is running mainline at the
moment, I will queue this patch up for 4.12-rc1, which will give the
Google people a bit more of an incentive to get their solution
implemented and working and merged :)

Sound reasonable?  I haven't started to go through my patch queue for
4.12-rc1 stuff just yet, still digging through it for 4.11-final things
at the moment.  Give me a week or so to catch up.

thanks,

greg k-h


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Michal Hocko
Greg, do you see any obstacle to have this merged. The discussion so far
shown that a) vendors are not using the code as is b) there seems to be
an agreement that something else than we have in the kernel is really
needed.

On Wed 22-02-17 13:01:21, Michal Hocko wrote:
> From: Michal Hocko 
> 
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
>   - it hooks into slab shrinker API which is not suitable for this
> purpose. lowmem_count implementation just shows this nicely.
> There is no scaling based on the memory pressure and no
> feedback to the generic shrinker infrastructure.
> Moreover lowmem_scan is called way too often for the heavy
> work it performs.
>   - it is not reclaim context aware - no NUMA and/or memcg
> awareness.
> 
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well. It also seems
> that the alternative LMK implementation will be solely in the userspace
> so this code has no perspective it seems. The staging tree is supposed
> to be for a code which needs to be put in shape before it can be merged
> which is not the case here obviously.
> 
> Signed-off-by: Michal Hocko 
> ---
>  drivers/staging/android/Kconfig   |  10 --
>  drivers/staging/android/Makefile  |   1 -
>  drivers/staging/android/lowmemorykiller.c | 212 
> --
>  include/linux/sched.h |   4 -
>  4 files changed, 227 deletions(-)
>  delete mode 100644 drivers/staging/android/lowmemorykiller.c
> 
> diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
> index 6c00d6f765c6..71a50b99caff 100644
> --- a/drivers/staging/android/Kconfig
> +++ b/drivers/staging/android/Kconfig
> @@ -14,16 +14,6 @@ config ASHMEM
> It is, in theory, a good memory allocator for low-memory devices,
> because it can discard shared memory units when under memory pressure.
>  
> -config ANDROID_LOW_MEMORY_KILLER
> - bool "Android Low Memory Killer"
> - ---help---
> -   Registers processes to be killed when low memory conditions, this is 
> useful
> -   as there is no particular swap space on android.
> -
> -   The registered process will kill according to the priorities in 
> android init
> -   scripts (/init.rc), and it defines priority values with minimum free 
> memory size
> -   for each priority.
> -
>  source "drivers/staging/android/ion/Kconfig"
>  
>  endif # if ANDROID
> diff --git a/drivers/staging/android/Makefile 
> b/drivers/staging/android/Makefile
> index 7ed1be798909..7cf1564a49a5 100644
> --- a/drivers/staging/android/Makefile
> +++ b/drivers/staging/android/Makefile
> @@ -3,4 +3,3 @@ ccflags-y += -I$(src) # needed for trace 
> events
>  obj-y+= ion/
>  
>  obj-$(CONFIG_ASHMEM) += ashmem.o
> -obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)  += lowmemorykiller.o
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> deleted file mode 100644
> index ec3b66561412..
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ /dev/null
> @@ -1,212 +0,0 @@
> -/* drivers/misc/lowmemorykiller.c
> - *
> - * The lowmemorykiller driver lets user-space specify a set of memory 
> thresholds
> - * where processes with a range of oom_score_adj values will get killed. 
> Specify
> - * the minimum oom_score_adj values in
> - * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
> - * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
> - * separated list of numbers in ascending order.
> - *
> - * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
> - * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill
> - * processes with a oom_score_adj value of 8 or higher when the free memory
> - * drops below 4096 pages and kill processes with a oom_score_adj value of 0 
> or
> - * higher when the free memory drops below 1024 pages.
> - *
> - * The driver considers memory used for caches to be free, but if a large
> - * percentage of the cached memory is locked this can be very inaccurate
> - * and processes may not get killed until the normal oom killer is triggered.
> - *
> - * Copyright (C) 2007-2008 Google, Inc.
> - *
> - * This software is licensed under the terms of the GNU General Public
> - * License version 2, as published by the Free Software Foundation, and
> - * may be copied, distributed, and modified under those terms.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or 

Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-09 Thread Michal Hocko
Greg, do you see any obstacle to have this merged. The discussion so far
shown that a) vendors are not using the code as is b) there seems to be
an agreement that something else than we have in the kernel is really
needed.

On Wed 22-02-17 13:01:21, Michal Hocko wrote:
> From: Michal Hocko 
> 
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
>   - it hooks into slab shrinker API which is not suitable for this
> purpose. lowmem_count implementation just shows this nicely.
> There is no scaling based on the memory pressure and no
> feedback to the generic shrinker infrastructure.
> Moreover lowmem_scan is called way too often for the heavy
> work it performs.
>   - it is not reclaim context aware - no NUMA and/or memcg
> awareness.
> 
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well. It also seems
> that the alternative LMK implementation will be solely in the userspace
> so this code has no perspective it seems. The staging tree is supposed
> to be for a code which needs to be put in shape before it can be merged
> which is not the case here obviously.
> 
> Signed-off-by: Michal Hocko 
> ---
>  drivers/staging/android/Kconfig   |  10 --
>  drivers/staging/android/Makefile  |   1 -
>  drivers/staging/android/lowmemorykiller.c | 212 
> --
>  include/linux/sched.h |   4 -
>  4 files changed, 227 deletions(-)
>  delete mode 100644 drivers/staging/android/lowmemorykiller.c
> 
> diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
> index 6c00d6f765c6..71a50b99caff 100644
> --- a/drivers/staging/android/Kconfig
> +++ b/drivers/staging/android/Kconfig
> @@ -14,16 +14,6 @@ config ASHMEM
> It is, in theory, a good memory allocator for low-memory devices,
> because it can discard shared memory units when under memory pressure.
>  
> -config ANDROID_LOW_MEMORY_KILLER
> - bool "Android Low Memory Killer"
> - ---help---
> -   Registers processes to be killed when low memory conditions, this is 
> useful
> -   as there is no particular swap space on android.
> -
> -   The registered process will kill according to the priorities in 
> android init
> -   scripts (/init.rc), and it defines priority values with minimum free 
> memory size
> -   for each priority.
> -
>  source "drivers/staging/android/ion/Kconfig"
>  
>  endif # if ANDROID
> diff --git a/drivers/staging/android/Makefile 
> b/drivers/staging/android/Makefile
> index 7ed1be798909..7cf1564a49a5 100644
> --- a/drivers/staging/android/Makefile
> +++ b/drivers/staging/android/Makefile
> @@ -3,4 +3,3 @@ ccflags-y += -I$(src) # needed for trace 
> events
>  obj-y+= ion/
>  
>  obj-$(CONFIG_ASHMEM) += ashmem.o
> -obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)  += lowmemorykiller.o
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> deleted file mode 100644
> index ec3b66561412..
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ /dev/null
> @@ -1,212 +0,0 @@
> -/* drivers/misc/lowmemorykiller.c
> - *
> - * The lowmemorykiller driver lets user-space specify a set of memory 
> thresholds
> - * where processes with a range of oom_score_adj values will get killed. 
> Specify
> - * the minimum oom_score_adj values in
> - * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
> - * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
> - * separated list of numbers in ascending order.
> - *
> - * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
> - * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill
> - * processes with a oom_score_adj value of 8 or higher when the free memory
> - * drops below 4096 pages and kill processes with a oom_score_adj value of 0 
> or
> - * higher when the free memory drops below 1024 pages.
> - *
> - * The driver considers memory used for caches to be free, but if a large
> - * percentage of the cached memory is locked this can be very inaccurate
> - * and processes may not get killed until the normal oom killer is triggered.
> - *
> - * Copyright (C) 2007-2008 Google, Inc.
> - *
> - * This software is licensed under the terms of the GNU General Public
> - * License version 2, as published by the Free Software Foundation, and
> - * may be copied, distributed, and modified under those terms.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-03 Thread Tim Murray
Hi all,

I mentioned before that I had some ideas to overhaul lowmemorykiller,
which would have the side effect of getting it out of the kernel. I've
been working through some prototypes over the past few weeks (actually
started before Michal sent his patch out), and I'd appreciate some
feedback on what I'd like to do so I can start working on more
complete patches.

First, Michal has mentioned why the current lowmemorykiller
implementation is bad. However, the design and implementation of
lowmemorykiller is bad for Android users as well. Rather than fixing
lowmemorykiller in the kernel or enabling an equivalent
reimplementation of lowmemorykiller in userspace, I think we can solve
the Android problems and remove lowmemorykiller from the tree at the
same time.

What's wrong with lowmemorykiller from an Android user's POV?

1. lowmemorykiller can be way too aggressive when there are transient
spikes in memory consumption. LMK relies on hand-tuned thresholds to
determine when to kill a process, but hitting the threshold shouldn't
always imply a kill. For example, on some current high-end Android
devices, lowmemorykiller will start to kill oom_score_adj 200
processes once there is less than 112MB in the page cache and less
than 112MB of free pages. oom_score_adj 200 is used for processes that
are important and visible to the user but not the currently-used
foreground app; music playback or camera post-processing for some apps
usually runs as adj 200. This threshold means that even if the system
would quiesce at 110MB in the page cache and 110MB of free pages,
something important to the user may die. This is bad!

2. lowmemorykiller can be way too passive on lower memory devices.
Because lowmemorykiller has a shared threshold for the amount of free
pages and the size of the page cache before it will kill a process,
there is a scenario that we hit all the time that results in low
memory devices becoming unusable. Assume the current application and
supporting system software need X bytes in the page cache in order to
provide reasonable UI performance, and X is larger than the zone_high
watermark that stops kswapd. The number of free pages can drop below
zone_low and kswapd will start evicting pages from the page cache;
however, because the working set is actually of size X, those pages
will be paged back in about as quickly as they can be paged out. This
manifests as kswapd constantly evicting file pages and the foreground
UI workload constantly waiting on page faults. Meanwhile, even if
there are very unimportant processes to kill, lowmemorykiller won't do
anything to kill them.

#2 can be addressed somewhat by separating the limits for number of
free pages and the size of the page cache, but then lowmemorykiller
would have two sets of arbitrary hand-tuned values and still no
knowledge of kswapd/reclaim. It doesn't make sense to do that if we
can avoid it.

We have plenty of evidence for both of these on real Android devices.
I'm bringing up these issues to not only explain the problems that
we'd like to solve, but also to provide some evidence that we're
serious about fixing lowmemorykiller once and for all.

Here's where I'd like to go.

First of all, lowmemorykiller should not be in the kernel, and Android
should move to per-app mem cgroups and kill unnecessary background
tasks when under memory pressure from userspace, not the kernel.

Second, Android has good knowledge of what's important to the user and
what's not. I'd like the ability to use that information to drive
decisions about reclaiming memory, so kswapd can shrink the mem
cgroups associated with background tasks before moving on to
foreground tasks. As far as I can tell, what I'm suggesting isn't a
soft limit or something like that. We don't have specific limits on
memory consumption for particular processes, and there's no size we
definitely want to get background processes to via reclaim before we
start reclaiming from foreground or persistent processes. In practice,
I think this looks like a per-memory-cgroup reclaim priority. I have a
prototype of this where I've added a new knob called memory.priority
from 0 to 10 that serves two purposes:

- Skip reclaiming from higher-priority cgroups entirely until the
priority from shrink_zone is high enough.
- Reduce the number of pages scanned from higher-priority cgroups once
they are eligible for reclamation.

This would let kswapd reclaim from applications that aren't critical
to the user while still occasionally reclaiming from persistent
processes (evicting pages that are used very rarely from
always-running system processes). This would effectively reduce the
size of backgrounded applications without impacting UI performance--a
huge improvement over what Android can do today.

Third, assuming we can do this kind of prioritized reclaim, I'd like
more information available via vmpressure (or similar) about the
current state of kswapd in terms of what priority it's trying to
reclaim. If lmkd knew 

Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-03-03 Thread Tim Murray
Hi all,

I mentioned before that I had some ideas to overhaul lowmemorykiller,
which would have the side effect of getting it out of the kernel. I've
been working through some prototypes over the past few weeks (actually
started before Michal sent his patch out), and I'd appreciate some
feedback on what I'd like to do so I can start working on more
complete patches.

First, Michal has mentioned why the current lowmemorykiller
implementation is bad. However, the design and implementation of
lowmemorykiller is bad for Android users as well. Rather than fixing
lowmemorykiller in the kernel or enabling an equivalent
reimplementation of lowmemorykiller in userspace, I think we can solve
the Android problems and remove lowmemorykiller from the tree at the
same time.

What's wrong with lowmemorykiller from an Android user's POV?

1. lowmemorykiller can be way too aggressive when there are transient
spikes in memory consumption. LMK relies on hand-tuned thresholds to
determine when to kill a process, but hitting the threshold shouldn't
always imply a kill. For example, on some current high-end Android
devices, lowmemorykiller will start to kill oom_score_adj 200
processes once there is less than 112MB in the page cache and less
than 112MB of free pages. oom_score_adj 200 is used for processes that
are important and visible to the user but not the currently-used
foreground app; music playback or camera post-processing for some apps
usually runs as adj 200. This threshold means that even if the system
would quiesce at 110MB in the page cache and 110MB of free pages,
something important to the user may die. This is bad!

2. lowmemorykiller can be way too passive on lower memory devices.
Because lowmemorykiller has a shared threshold for the amount of free
pages and the size of the page cache before it will kill a process,
there is a scenario that we hit all the time that results in low
memory devices becoming unusable. Assume the current application and
supporting system software need X bytes in the page cache in order to
provide reasonable UI performance, and X is larger than the zone_high
watermark that stops kswapd. The number of free pages can drop below
zone_low and kswapd will start evicting pages from the page cache;
however, because the working set is actually of size X, those pages
will be paged back in about as quickly as they can be paged out. This
manifests as kswapd constantly evicting file pages and the foreground
UI workload constantly waiting on page faults. Meanwhile, even if
there are very unimportant processes to kill, lowmemorykiller won't do
anything to kill them.

#2 can be addressed somewhat by separating the limits for number of
free pages and the size of the page cache, but then lowmemorykiller
would have two sets of arbitrary hand-tuned values and still no
knowledge of kswapd/reclaim. It doesn't make sense to do that if we
can avoid it.

We have plenty of evidence for both of these on real Android devices.
I'm bringing up these issues to not only explain the problems that
we'd like to solve, but also to provide some evidence that we're
serious about fixing lowmemorykiller once and for all.

Here's where I'd like to go.

First of all, lowmemorykiller should not be in the kernel, and Android
should move to per-app mem cgroups and kill unnecessary background
tasks when under memory pressure from userspace, not the kernel.

Second, Android has good knowledge of what's important to the user and
what's not. I'd like the ability to use that information to drive
decisions about reclaiming memory, so kswapd can shrink the mem
cgroups associated with background tasks before moving on to
foreground tasks. As far as I can tell, what I'm suggesting isn't a
soft limit or something like that. We don't have specific limits on
memory consumption for particular processes, and there's no size we
definitely want to get background processes to via reclaim before we
start reclaiming from foreground or persistent processes. In practice,
I think this looks like a per-memory-cgroup reclaim priority. I have a
prototype of this where I've added a new knob called memory.priority
from 0 to 10 that serves two purposes:

- Skip reclaiming from higher-priority cgroups entirely until the
priority from shrink_zone is high enough.
- Reduce the number of pages scanned from higher-priority cgroups once
they are eligible for reclamation.

This would let kswapd reclaim from applications that aren't critical
to the user while still occasionally reclaiming from persistent
processes (evicting pages that are used very rarely from
always-running system processes). This would effectively reduce the
size of backgrounded applications without impacting UI performance--a
huge improvement over what Android can do today.

Third, assuming we can do this kind of prioritized reclaim, I'd like
more information available via vmpressure (or similar) about the
current state of kswapd in terms of what priority it's trying to
reclaim. If lmkd knew 

Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 16:40:13, peter enderborg wrote:
> On 02/24/2017 04:03 PM, Michal Hocko wrote:
> > On Fri 24-02-17 15:42:49, peter enderborg wrote:
> >> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> >>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>  On 02/24/2017 01:28 PM, Michal Hocko wrote:
> >>> [...]
> > Yeah, I strongly believe that the chosen approach is completely wrong.
> > Both in abusing the shrinker interface and abusing oom_score_adj as the
> > only criterion for the oom victim selection.
>  No one is arguing that shrinker is not problematic. And would be great
>  if it is removed from lmk.  The oom_score_adj is the way user-space
>  tells the kernel what the user-space has as prio. And android is using
>  that very much. It's a core part.
> >>> Is there any documentation which describes how this is done?
> >>>
>  I have never seen it be used on
>  other linux system so what is the intended usage of oom_score_adj? Is
>  this really abusing?
> >>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> >>> criterion on its own, well, except for the extreme sides of the range
> >>> which are defined to enforce resp. disallow selecting the task. The
> >>> global oom killer calculates the oom score as a function of the memory
> >>> consumption. Your patch simply ignores the memory consumption (and uses
> >>> pids to sort tasks with the same oom score which is just mind boggling)
> >> How much it uses is of very little importance for android.
> > But it is relevant for the global oom killer which is the main consumer of
> > the oom_score_adj.
> >
> >> The score
> >> used are only for apps and their services. System related are not
> >> touched by android lmk. The pid is only to have a unique key to be
> >> able to have it fast within a rbtree.  One idea was to use task_pid to
> >> get a strict age of process to get a round robin but since it does not
> >> matter i skipped that idea since it does not matter.
> > Pid will not tell you anything about the age. Pids do wrap around.
> >
> >>> and that is what I call the abuse. The oom score calculation might
> >>> change in future, of course, but all consumers of the oom_score_adj
> >>> really have to agree on the base which is adjusted by this tunable
> >>> otherwise you can see a lot of unexpected behavior.
> >> Then can we just define a range that is strictly for user-space?
> > This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> > is usable.
>
> So we use them in userspace and kernel space but where is the abuse then?

I believe I have already answered that.

> >>> I would even argue that nobody outside of mm/oom_kill.c should really
> >>> have any business with this tunable.  You can of course tweak the value
> >>> from the userspace and help to chose a better oom victim this way but
> >>> that is it.
> >> Why only help? If userspace can give an exact order to kernel that
> >> must be a good thing; other wise kernel have to guess and when
> >> can that be better? 
> > Because userspace doesn't know who is the best victim in 99% cases.
>
> If user-space does not tell kernel what to it have to guess, android
> user-space does, and maybe other should too.

I believe they would do if this was so simple. This is not as easy to
answer as you might think. If you ask, everybody will consider their
task important enough to be killed.

[...]

> > In any case playing nasty games with the oom killer tunables might and
> > will lead, well, to unexpected behavior.
> 
> I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
> we then be "safe"?

I didn't say that. I was just trying to say that you cannot give a
single knob two different semantics depending on who is using them. That
simply won't work. So if you believe that your LMK knows better then try
to use something else for your heuristics.

Anyway, this is still offtopic I believe.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 16:40:13, peter enderborg wrote:
> On 02/24/2017 04:03 PM, Michal Hocko wrote:
> > On Fri 24-02-17 15:42:49, peter enderborg wrote:
> >> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> >>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>  On 02/24/2017 01:28 PM, Michal Hocko wrote:
> >>> [...]
> > Yeah, I strongly believe that the chosen approach is completely wrong.
> > Both in abusing the shrinker interface and abusing oom_score_adj as the
> > only criterion for the oom victim selection.
>  No one is arguing that shrinker is not problematic. And would be great
>  if it is removed from lmk.  The oom_score_adj is the way user-space
>  tells the kernel what the user-space has as prio. And android is using
>  that very much. It's a core part.
> >>> Is there any documentation which describes how this is done?
> >>>
>  I have never seen it be used on
>  other linux system so what is the intended usage of oom_score_adj? Is
>  this really abusing?
> >>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> >>> criterion on its own, well, except for the extreme sides of the range
> >>> which are defined to enforce resp. disallow selecting the task. The
> >>> global oom killer calculates the oom score as a function of the memory
> >>> consumption. Your patch simply ignores the memory consumption (and uses
> >>> pids to sort tasks with the same oom score which is just mind boggling)
> >> How much it uses is of very little importance for android.
> > But it is relevant for the global oom killer which is the main consumer of
> > the oom_score_adj.
> >
> >> The score
> >> used are only for apps and their services. System related are not
> >> touched by android lmk. The pid is only to have a unique key to be
> >> able to have it fast within a rbtree.  One idea was to use task_pid to
> >> get a strict age of process to get a round robin but since it does not
> >> matter i skipped that idea since it does not matter.
> > Pid will not tell you anything about the age. Pids do wrap around.
> >
> >>> and that is what I call the abuse. The oom score calculation might
> >>> change in future, of course, but all consumers of the oom_score_adj
> >>> really have to agree on the base which is adjusted by this tunable
> >>> otherwise you can see a lot of unexpected behavior.
> >> Then can we just define a range that is strictly for user-space?
> > This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> > is usable.
>
> So we use them in userspace and kernel space but where is the abuse then?

I believe I have already answered that.

> >>> I would even argue that nobody outside of mm/oom_kill.c should really
> >>> have any business with this tunable.  You can of course tweak the value
> >>> from the userspace and help to chose a better oom victim this way but
> >>> that is it.
> >> Why only help? If userspace can give an exact order to kernel that
> >> must be a good thing; other wise kernel have to guess and when
> >> can that be better? 
> > Because userspace doesn't know who is the best victim in 99% cases.
>
> If user-space does not tell kernel what to it have to guess, android
> user-space does, and maybe other should too.

I believe they would do if this was so simple. This is not as easy to
answer as you might think. If you ask, everybody will consider their
task important enough to be killed.

[...]

> > In any case playing nasty games with the oom killer tunables might and
> > will lead, well, to unexpected behavior.
> 
> I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
> we then be "safe"?

I didn't say that. I was just trying to say that you cannot give a
single knob two different semantics depending on who is using them. That
simply won't work. So if you believe that your LMK knows better then try
to use something else for your heuristics.

Anyway, this is still offtopic I believe.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 04:03 PM, Michal Hocko wrote:
> On Fri 24-02-17 15:42:49, peter enderborg wrote:
>> On 02/24/2017 03:11 PM, Michal Hocko wrote:
>>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
 On 02/24/2017 01:28 PM, Michal Hocko wrote:
>>> [...]
> Yeah, I strongly believe that the chosen approach is completely wrong.
> Both in abusing the shrinker interface and abusing oom_score_adj as the
> only criterion for the oom victim selection.
 No one is arguing that shrinker is not problematic. And would be great
 if it is removed from lmk.  The oom_score_adj is the way user-space
 tells the kernel what the user-space has as prio. And android is using
 that very much. It's a core part.
>>> Is there any documentation which describes how this is done?
>>>
 I have never seen it be used on
 other linux system so what is the intended usage of oom_score_adj? Is
 this really abusing?
>>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
>>> criterion on its own, well, except for the extreme sides of the range
>>> which are defined to enforce resp. disallow selecting the task. The
>>> global oom killer calculates the oom score as a function of the memory
>>> consumption. Your patch simply ignores the memory consumption (and uses
>>> pids to sort tasks with the same oom score which is just mind boggling)
>> How much it uses is of very little importance for android.
> But it is relevant for the global oom killer which is the main consumer of
> the oom_score_adj.
>
>> The score
>> used are only for apps and their services. System related are not
>> touched by android lmk. The pid is only to have a unique key to be
>> able to have it fast within a rbtree.  One idea was to use task_pid to
>> get a strict age of process to get a round robin but since it does not
>> matter i skipped that idea since it does not matter.
> Pid will not tell you anything about the age. Pids do wrap around.
>
>>> and that is what I call the abuse. The oom score calculation might
>>> change in future, of course, but all consumers of the oom_score_adj
>>> really have to agree on the base which is adjusted by this tunable
>>> otherwise you can see a lot of unexpected behavior.
>> Then can we just define a range that is strictly for user-space?
> This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> is usable.
So we use them in userspace and kernel space but where is the abuse then?
>>> I would even argue that nobody outside of mm/oom_kill.c should really
>>> have any business with this tunable.  You can of course tweak the value
>>> from the userspace and help to chose a better oom victim this way but
>>> that is it.
>> Why only help? If userspace can give an exact order to kernel that
>> must be a good thing; other wise kernel have to guess and when
>> can that be better? 
> Because userspace doesn't know who is the best victim in 99% cases.
If user-space does not tell kernel what to it have to guess, android
user-space does, and maybe other should too.
> Android might be different, although, I am a bit skeptical - especially
> after hearing quite some complains about random application being
> killed... If you do believe that you know better then, by all means,
> implement your custom user space LMK and chose the oom victim on a
> different basis but try to understand that the global OOM killer is the
> last resort measure to make the system usable again. There is a good
> reason why the kernel uses the current badness calculation. The previous
> implementation which considered the process age ad other things was just
> too random to have a understandable behavior.
I think it make sense that there is only one way to describe what is
important what is not. And oom_kill is the last resort is one problem
for android. Android lowmemorykiller balance memory usage and
tries to be more proactive and that is why shrinkers work so well.
> In any case playing nasty games with the oom killer tunables might and
> will lead, well, to unexpected behavior.

I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
we then be "safe"?



Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 04:03 PM, Michal Hocko wrote:
> On Fri 24-02-17 15:42:49, peter enderborg wrote:
>> On 02/24/2017 03:11 PM, Michal Hocko wrote:
>>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
 On 02/24/2017 01:28 PM, Michal Hocko wrote:
>>> [...]
> Yeah, I strongly believe that the chosen approach is completely wrong.
> Both in abusing the shrinker interface and abusing oom_score_adj as the
> only criterion for the oom victim selection.
 No one is arguing that shrinker is not problematic. And would be great
 if it is removed from lmk.  The oom_score_adj is the way user-space
 tells the kernel what the user-space has as prio. And android is using
 that very much. It's a core part.
>>> Is there any documentation which describes how this is done?
>>>
 I have never seen it be used on
 other linux system so what is the intended usage of oom_score_adj? Is
 this really abusing?
>>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
>>> criterion on its own, well, except for the extreme sides of the range
>>> which are defined to enforce resp. disallow selecting the task. The
>>> global oom killer calculates the oom score as a function of the memory
>>> consumption. Your patch simply ignores the memory consumption (and uses
>>> pids to sort tasks with the same oom score which is just mind boggling)
>> How much it uses is of very little importance for android.
> But it is relevant for the global oom killer which is the main consumer of
> the oom_score_adj.
>
>> The score
>> used are only for apps and their services. System related are not
>> touched by android lmk. The pid is only to have a unique key to be
>> able to have it fast within a rbtree.  One idea was to use task_pid to
>> get a strict age of process to get a round robin but since it does not
>> matter i skipped that idea since it does not matter.
> Pid will not tell you anything about the age. Pids do wrap around.
>
>>> and that is what I call the abuse. The oom score calculation might
>>> change in future, of course, but all consumers of the oom_score_adj
>>> really have to agree on the base which is adjusted by this tunable
>>> otherwise you can see a lot of unexpected behavior.
>> Then can we just define a range that is strictly for user-space?
> This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> is usable.
So we use them in userspace and kernel space but where is the abuse then?
>>> I would even argue that nobody outside of mm/oom_kill.c should really
>>> have any business with this tunable.  You can of course tweak the value
>>> from the userspace and help to chose a better oom victim this way but
>>> that is it.
>> Why only help? If userspace can give an exact order to kernel that
>> must be a good thing; other wise kernel have to guess and when
>> can that be better? 
> Because userspace doesn't know who is the best victim in 99% cases.
If user-space does not tell kernel what to it have to guess, android
user-space does, and maybe other should too.
> Android might be different, although, I am a bit skeptical - especially
> after hearing quite some complains about random application being
> killed... If you do believe that you know better then, by all means,
> implement your custom user space LMK and chose the oom victim on a
> different basis but try to understand that the global OOM killer is the
> last resort measure to make the system usable again. There is a good
> reason why the kernel uses the current badness calculation. The previous
> implementation which considered the process age ad other things was just
> too random to have a understandable behavior.
I think it make sense that there is only one way to describe what is
important what is not. And oom_kill is the last resort is one problem
for android. Android lowmemorykiller balance memory usage and
tries to be more proactive and that is why shrinkers work so well.
> In any case playing nasty games with the oom killer tunables might and
> will lead, well, to unexpected behavior.

I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
we then be "safe"?



Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 15:42:49, peter enderborg wrote:
> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> > On Fri 24-02-17 14:16:34, peter enderborg wrote:
> >> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> > [...]
> >>> Yeah, I strongly believe that the chosen approach is completely wrong.
> >>> Both in abusing the shrinker interface and abusing oom_score_adj as the
> >>> only criterion for the oom victim selection.
> >> No one is arguing that shrinker is not problematic. And would be great
> >> if it is removed from lmk.  The oom_score_adj is the way user-space
> >> tells the kernel what the user-space has as prio. And android is using
> >> that very much. It's a core part.
> > Is there any documentation which describes how this is done?
> >
> >> I have never seen it be used on
> >> other linux system so what is the intended usage of oom_score_adj? Is
> >> this really abusing?
> > oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> > criterion on its own, well, except for the extreme sides of the range
> > which are defined to enforce resp. disallow selecting the task. The
> > global oom killer calculates the oom score as a function of the memory
> > consumption. Your patch simply ignores the memory consumption (and uses
> > pids to sort tasks with the same oom score which is just mind boggling)
>
> How much it uses is of very little importance for android.

But it is relevant for the global oom killer which is the main consumer of
the oom_score_adj.

> The score
> used are only for apps and their services. System related are not
> touched by android lmk. The pid is only to have a unique key to be
> able to have it fast within a rbtree.  One idea was to use task_pid to
> get a strict age of process to get a round robin but since it does not
> matter i skipped that idea since it does not matter.

Pid will not tell you anything about the age. Pids do wrap around.

> > and that is what I call the abuse. The oom score calculation might
> > change in future, of course, but all consumers of the oom_score_adj
> > really have to agree on the base which is adjusted by this tunable
> > otherwise you can see a lot of unexpected behavior.
>
> Then can we just define a range that is strictly for user-space?

This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
is usable.

> > I would even argue that nobody outside of mm/oom_kill.c should really
> > have any business with this tunable.  You can of course tweak the value
> > from the userspace and help to chose a better oom victim this way but
> > that is it.
>
> Why only help? If userspace can give an exact order to kernel that
> must be a good thing; other wise kernel have to guess and when
> can that be better? 

Because userspace doesn't know who is the best victim in 99% cases.
Android might be different, although, I am a bit skeptical - especially
after hearing quite some complains about random application being
killed... If you do believe that you know better then, by all means,
implement your custom user space LMK and chose the oom victim on a
different basis but try to understand that the global OOM killer is the
last resort measure to make the system usable again. There is a good
reason why the kernel uses the current badness calculation. The previous
implementation which considered the process age ad other things was just
too random to have a understandable behavior.

In any case playing nasty games with the oom killer tunables might and
will lead, well, to unexpected behavior.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 15:42:49, peter enderborg wrote:
> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> > On Fri 24-02-17 14:16:34, peter enderborg wrote:
> >> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> > [...]
> >>> Yeah, I strongly believe that the chosen approach is completely wrong.
> >>> Both in abusing the shrinker interface and abusing oom_score_adj as the
> >>> only criterion for the oom victim selection.
> >> No one is arguing that shrinker is not problematic. And would be great
> >> if it is removed from lmk.  The oom_score_adj is the way user-space
> >> tells the kernel what the user-space has as prio. And android is using
> >> that very much. It's a core part.
> > Is there any documentation which describes how this is done?
> >
> >> I have never seen it be used on
> >> other linux system so what is the intended usage of oom_score_adj? Is
> >> this really abusing?
> > oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> > criterion on its own, well, except for the extreme sides of the range
> > which are defined to enforce resp. disallow selecting the task. The
> > global oom killer calculates the oom score as a function of the memory
> > consumption. Your patch simply ignores the memory consumption (and uses
> > pids to sort tasks with the same oom score which is just mind boggling)
>
> How much it uses is of very little importance for android.

But it is relevant for the global oom killer which is the main consumer of
the oom_score_adj.

> The score
> used are only for apps and their services. System related are not
> touched by android lmk. The pid is only to have a unique key to be
> able to have it fast within a rbtree.  One idea was to use task_pid to
> get a strict age of process to get a round robin but since it does not
> matter i skipped that idea since it does not matter.

Pid will not tell you anything about the age. Pids do wrap around.

> > and that is what I call the abuse. The oom score calculation might
> > change in future, of course, but all consumers of the oom_score_adj
> > really have to agree on the base which is adjusted by this tunable
> > otherwise you can see a lot of unexpected behavior.
>
> Then can we just define a range that is strictly for user-space?

This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
is usable.

> > I would even argue that nobody outside of mm/oom_kill.c should really
> > have any business with this tunable.  You can of course tweak the value
> > from the userspace and help to chose a better oom victim this way but
> > that is it.
>
> Why only help? If userspace can give an exact order to kernel that
> must be a good thing; other wise kernel have to guess and when
> can that be better? 

Because userspace doesn't know who is the best victim in 99% cases.
Android might be different, although, I am a bit skeptical - especially
after hearing quite some complains about random application being
killed... If you do believe that you know better then, by all means,
implement your custom user space LMK and chose the oom victim on a
different basis but try to understand that the global OOM killer is the
last resort measure to make the system usable again. There is a good
reason why the kernel uses the current badness calculation. The previous
implementation which considered the process age ad other things was just
too random to have a understandable behavior.

In any case playing nasty games with the oom killer tunables might and
will lead, well, to unexpected behavior.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 03:11 PM, Michal Hocko wrote:
> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> [...]
>>> Yeah, I strongly believe that the chosen approach is completely wrong.
>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
>>> only criterion for the oom victim selection.
>> No one is arguing that shrinker is not problematic. And would be great
>> if it is removed from lmk.  The oom_score_adj is the way user-space
>> tells the kernel what the user-space has as prio. And android is using
>> that very much. It's a core part.
> Is there any documentation which describes how this is done?
>
>> I have never seen it be used on
>> other linux system so what is the intended usage of oom_score_adj? Is
>> this really abusing?
> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> criterion on its own, well, except for the extreme sides of the range
> which are defined to enforce resp. disallow selecting the task. The
> global oom killer calculates the oom score as a function of the memory
> consumption. Your patch simply ignores the memory consumption (and uses
> pids to sort tasks with the same oom score which is just mind boggling)
How much it uses is of very little importance for android. The score
used are only for apps and their services. System related are not touched by
android lmk. The pid is only to have a unique key to be able to have it fast 
within a rbtree.
One idea was to use task_pid to get a strict age of process to get a round robin
but since it does not matter i skipped that idea since it does not matter.
> and that is what I call the abuse. The oom score calculation might
> change in future, of course, but all consumers of the oom_score_adj
> really have to agree on the base which is adjusted by this tunable
> otherwise you can see a lot of unexpected behavior.
Then can we just define a range that is strictly for user-space?
> I would even argue that nobody outside of mm/oom_kill.c should really
> have any business with this tunable.  You can of course tweak the value
> from the userspace and help to chose a better oom victim this way but
> that is it.
Why only help? If userspace can give an exact order to kernel that
must be a good thing; other wise kernel have to guess and when
can that be better? 
> Anyway, I guess we are getting quite off-topic here.
>



Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 03:11 PM, Michal Hocko wrote:
> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> [...]
>>> Yeah, I strongly believe that the chosen approach is completely wrong.
>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
>>> only criterion for the oom victim selection.
>> No one is arguing that shrinker is not problematic. And would be great
>> if it is removed from lmk.  The oom_score_adj is the way user-space
>> tells the kernel what the user-space has as prio. And android is using
>> that very much. It's a core part.
> Is there any documentation which describes how this is done?
>
>> I have never seen it be used on
>> other linux system so what is the intended usage of oom_score_adj? Is
>> this really abusing?
> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> criterion on its own, well, except for the extreme sides of the range
> which are defined to enforce resp. disallow selecting the task. The
> global oom killer calculates the oom score as a function of the memory
> consumption. Your patch simply ignores the memory consumption (and uses
> pids to sort tasks with the same oom score which is just mind boggling)
How much it uses is of very little importance for android. The score
used are only for apps and their services. System related are not touched by
android lmk. The pid is only to have a unique key to be able to have it fast 
within a rbtree.
One idea was to use task_pid to get a strict age of process to get a round robin
but since it does not matter i skipped that idea since it does not matter.
> and that is what I call the abuse. The oom score calculation might
> change in future, of course, but all consumers of the oom_score_adj
> really have to agree on the base which is adjusted by this tunable
> otherwise you can see a lot of unexpected behavior.
Then can we just define a range that is strictly for user-space?
> I would even argue that nobody outside of mm/oom_kill.c should really
> have any business with this tunable.  You can of course tweak the value
> from the userspace and help to chose a better oom victim this way but
> that is it.
Why only help? If userspace can give an exact order to kernel that
must be a good thing; other wise kernel have to guess and when
can that be better? 
> Anyway, I guess we are getting quite off-topic here.
>



Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 14:16:34, peter enderborg wrote:
> On 02/24/2017 01:28 PM, Michal Hocko wrote:
[...]
> > Yeah, I strongly believe that the chosen approach is completely wrong.
> > Both in abusing the shrinker interface and abusing oom_score_adj as the
> > only criterion for the oom victim selection.
> 
> No one is arguing that shrinker is not problematic. And would be great
> if it is removed from lmk.  The oom_score_adj is the way user-space
> tells the kernel what the user-space has as prio. And android is using
> that very much. It's a core part.

Is there any documentation which describes how this is done?

> I have never seen it be used on
> other linux system so what is the intended usage of oom_score_adj? Is
> this really abusing?

oom_score_adj is used to _adjust_ the calculated oom score. It is not a
criterion on its own, well, except for the extreme sides of the range
which are defined to enforce resp. disallow selecting the task. The
global oom killer calculates the oom score as a function of the memory
consumption. Your patch simply ignores the memory consumption (and uses
pids to sort tasks with the same oom score which is just mind boggling)
and that is what I call the abuse. The oom score calculation might
change in future, of course, but all consumers of the oom_score_adj
really have to agree on the base which is adjusted by this tunable
otherwise you can see a lot of unexpected behavior.

I would even argue that nobody outside of mm/oom_kill.c should really
have any business with this tunable.  You can of course tweak the value
from the userspace and help to chose a better oom victim this way but
that is it.

Anyway, I guess we are getting quite off-topic here.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 14:16:34, peter enderborg wrote:
> On 02/24/2017 01:28 PM, Michal Hocko wrote:
[...]
> > Yeah, I strongly believe that the chosen approach is completely wrong.
> > Both in abusing the shrinker interface and abusing oom_score_adj as the
> > only criterion for the oom victim selection.
> 
> No one is arguing that shrinker is not problematic. And would be great
> if it is removed from lmk.  The oom_score_adj is the way user-space
> tells the kernel what the user-space has as prio. And android is using
> that very much. It's a core part.

Is there any documentation which describes how this is done?

> I have never seen it be used on
> other linux system so what is the intended usage of oom_score_adj? Is
> this really abusing?

oom_score_adj is used to _adjust_ the calculated oom score. It is not a
criterion on its own, well, except for the extreme sides of the range
which are defined to enforce resp. disallow selecting the task. The
global oom killer calculates the oom score as a function of the memory
consumption. Your patch simply ignores the memory consumption (and uses
pids to sort tasks with the same oom score which is just mind boggling)
and that is what I call the abuse. The oom score calculation might
change in future, of course, but all consumers of the oom_score_adj
really have to agree on the base which is adjusted by this tunable
otherwise you can see a lot of unexpected behavior.

I would even argue that nobody outside of mm/oom_kill.c should really
have any business with this tunable.  You can of course tweak the value
from the userspace and help to chose a better oom victim this way but
that is it.

Anyway, I guess we are getting quite off-topic here.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 01:28 PM, Michal Hocko wrote:
> On Fri 24-02-17 13:19:46, peter enderborg wrote:
>> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
>>> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
 So, just for context, Android does have a userland LMK daemon (using
 the mempressure notifiers) as you mentioned, but unfortunately I'm
 unaware of any devices that ship with that implementation.
>>> I've previously worked on enabling userspace lmkd for a previous
>>> release, but ran into some issues there (see below).
>>>
 This is reportedly because while the mempressure notifiers provide a
 the signal to userspace, the work the deamon then has to do to look up
 per process memory usage, in order to figure out who is best to kill
 at that point was too costly and resulted in poor device performance.
>>> In particular, mempressure requires memory cgroups to function, and we
>>> saw performance regressions due to the accounting done in mem cgroups.
>>> At the time we didn't have enough time left to solve this before the
>>> release, and we reverted back to kernel lmkd.
>>>
 So for shipping Android devices, the LMK is still needed. However, its
 not critical for basic android development, as the system will
 function without it.
>>> It will function, but it most likely will perform horribly (as the
>>> page cache will be trashed to such a level that the system will be
>>> unusable).
>>>
 Additionally I believe most vendors heavily
 customize the LMK in their vendor tree, so the value of having it in
 staging might be relatively low.

 It would be great however to get a discussion going here on what the
 ulmkd needs from the kernel in order to efficiently determine who best
 to kill, and how we might best implement that.
>>> The two main issues I think we need to address are:
>>> 1) Getting the right granularity of events from the kernel; I once
>>> tried to submit a patch upstream to address this:
>>> https://lkml.org/lkml/2016/2/24/582
>>> 2) Find out where exactly the memory cgroup overhead is coming from,
>>> and how to reduce it or work around it to acceptable levels for
>>> Android. This was also on 3.10, and maybe this has long been fixed or
>>> improved in more recent kernel versions.
>>>
>>> I don't have cycles to work on this now, but I'm happy to talk to
>>> whoever picks this up on the Android side.
>> I sent some patches that is different approach. It still uses shrinkers
>> but it has a kernel part that do the kill part better than the old one
>> but it does it the android way. The future for this is get it triggered
>> with other path's than slab shrinker. But we will not continue unless
>> we get google-android to be part of it. Hocko objected heavy on
>> the patches but seems not to see that we need something to
>> do the job before we can disconnect from shrinker.
> Yeah, I strongly believe that the chosen approach is completely wrong.
> Both in abusing the shrinker interface and abusing oom_score_adj as the
> only criterion for the oom victim selection.

No one is arguing that shrinker is not problematic. And would be great if it is 
removed from lmk.
The oom_score_adj is the way user-space tells the kernel what the user-space 
has as prio. And android
is using that very much. It's a core part. I have never seen it be used on other
linux system so what is the intended usage of oom_score_adj? Is this really 
abusing?

I think I can help out with removing  shrinker from lmk. Not using 
oom_score_adj is harder and
has a bigger impact on android, except the trivial solution by adding 
replacement
oom_user_prio and use that within android and kernel.




Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 01:28 PM, Michal Hocko wrote:
> On Fri 24-02-17 13:19:46, peter enderborg wrote:
>> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
>>> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
 So, just for context, Android does have a userland LMK daemon (using
 the mempressure notifiers) as you mentioned, but unfortunately I'm
 unaware of any devices that ship with that implementation.
>>> I've previously worked on enabling userspace lmkd for a previous
>>> release, but ran into some issues there (see below).
>>>
 This is reportedly because while the mempressure notifiers provide a
 the signal to userspace, the work the deamon then has to do to look up
 per process memory usage, in order to figure out who is best to kill
 at that point was too costly and resulted in poor device performance.
>>> In particular, mempressure requires memory cgroups to function, and we
>>> saw performance regressions due to the accounting done in mem cgroups.
>>> At the time we didn't have enough time left to solve this before the
>>> release, and we reverted back to kernel lmkd.
>>>
 So for shipping Android devices, the LMK is still needed. However, its
 not critical for basic android development, as the system will
 function without it.
>>> It will function, but it most likely will perform horribly (as the
>>> page cache will be trashed to such a level that the system will be
>>> unusable).
>>>
 Additionally I believe most vendors heavily
 customize the LMK in their vendor tree, so the value of having it in
 staging might be relatively low.

 It would be great however to get a discussion going here on what the
 ulmkd needs from the kernel in order to efficiently determine who best
 to kill, and how we might best implement that.
>>> The two main issues I think we need to address are:
>>> 1) Getting the right granularity of events from the kernel; I once
>>> tried to submit a patch upstream to address this:
>>> https://lkml.org/lkml/2016/2/24/582
>>> 2) Find out where exactly the memory cgroup overhead is coming from,
>>> and how to reduce it or work around it to acceptable levels for
>>> Android. This was also on 3.10, and maybe this has long been fixed or
>>> improved in more recent kernel versions.
>>>
>>> I don't have cycles to work on this now, but I'm happy to talk to
>>> whoever picks this up on the Android side.
>> I sent some patches that is different approach. It still uses shrinkers
>> but it has a kernel part that do the kill part better than the old one
>> but it does it the android way. The future for this is get it triggered
>> with other path's than slab shrinker. But we will not continue unless
>> we get google-android to be part of it. Hocko objected heavy on
>> the patches but seems not to see that we need something to
>> do the job before we can disconnect from shrinker.
> Yeah, I strongly believe that the chosen approach is completely wrong.
> Both in abusing the shrinker interface and abusing oom_score_adj as the
> only criterion for the oom victim selection.

No one is arguing that shrinker is not problematic. And would be great if it is 
removed from lmk.
The oom_score_adj is the way user-space tells the kernel what the user-space 
has as prio. And android
is using that very much. It's a core part. I have never seen it be used on other
linux system so what is the intended usage of oom_score_adj? Is this really 
abusing?

I think I can help out with removing  shrinker from lmk. Not using 
oom_score_adj is harder and
has a bigger impact on android, except the trivial solution by adding 
replacement
oom_user_prio and use that within android and kernel.




Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 13:19:46, peter enderborg wrote:
> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> > On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
> >> So, just for context, Android does have a userland LMK daemon (using
> >> the mempressure notifiers) as you mentioned, but unfortunately I'm
> >> unaware of any devices that ship with that implementation.
> > I've previously worked on enabling userspace lmkd for a previous
> > release, but ran into some issues there (see below).
> >
> >> This is reportedly because while the mempressure notifiers provide a
> >> the signal to userspace, the work the deamon then has to do to look up
> >> per process memory usage, in order to figure out who is best to kill
> >> at that point was too costly and resulted in poor device performance.
> > In particular, mempressure requires memory cgroups to function, and we
> > saw performance regressions due to the accounting done in mem cgroups.
> > At the time we didn't have enough time left to solve this before the
> > release, and we reverted back to kernel lmkd.
> >
> >> So for shipping Android devices, the LMK is still needed. However, its
> >> not critical for basic android development, as the system will
> >> function without it.
> > It will function, but it most likely will perform horribly (as the
> > page cache will be trashed to such a level that the system will be
> > unusable).
> >
> >> Additionally I believe most vendors heavily
> >> customize the LMK in their vendor tree, so the value of having it in
> >> staging might be relatively low.
> >>
> >> It would be great however to get a discussion going here on what the
> >> ulmkd needs from the kernel in order to efficiently determine who best
> >> to kill, and how we might best implement that.
> > The two main issues I think we need to address are:
> > 1) Getting the right granularity of events from the kernel; I once
> > tried to submit a patch upstream to address this:
> > https://lkml.org/lkml/2016/2/24/582
> > 2) Find out where exactly the memory cgroup overhead is coming from,
> > and how to reduce it or work around it to acceptable levels for
> > Android. This was also on 3.10, and maybe this has long been fixed or
> > improved in more recent kernel versions.
> >
> > I don't have cycles to work on this now, but I'm happy to talk to
> > whoever picks this up on the Android side.
> I sent some patches that is different approach. It still uses shrinkers
> but it has a kernel part that do the kill part better than the old one
> but it does it the android way. The future for this is get it triggered
> with other path's than slab shrinker. But we will not continue unless
> we get google-android to be part of it. Hocko objected heavy on
> the patches but seems not to see that we need something to
> do the job before we can disconnect from shrinker.

Yeah, I strongly believe that the chosen approach is completely wrong.
Both in abusing the shrinker interface and abusing oom_score_adj as the
only criterion for the oom victim selection.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Fri 24-02-17 13:19:46, peter enderborg wrote:
> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> > On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
> >> So, just for context, Android does have a userland LMK daemon (using
> >> the mempressure notifiers) as you mentioned, but unfortunately I'm
> >> unaware of any devices that ship with that implementation.
> > I've previously worked on enabling userspace lmkd for a previous
> > release, but ran into some issues there (see below).
> >
> >> This is reportedly because while the mempressure notifiers provide a
> >> the signal to userspace, the work the deamon then has to do to look up
> >> per process memory usage, in order to figure out who is best to kill
> >> at that point was too costly and resulted in poor device performance.
> > In particular, mempressure requires memory cgroups to function, and we
> > saw performance regressions due to the accounting done in mem cgroups.
> > At the time we didn't have enough time left to solve this before the
> > release, and we reverted back to kernel lmkd.
> >
> >> So for shipping Android devices, the LMK is still needed. However, its
> >> not critical for basic android development, as the system will
> >> function without it.
> > It will function, but it most likely will perform horribly (as the
> > page cache will be trashed to such a level that the system will be
> > unusable).
> >
> >> Additionally I believe most vendors heavily
> >> customize the LMK in their vendor tree, so the value of having it in
> >> staging might be relatively low.
> >>
> >> It would be great however to get a discussion going here on what the
> >> ulmkd needs from the kernel in order to efficiently determine who best
> >> to kill, and how we might best implement that.
> > The two main issues I think we need to address are:
> > 1) Getting the right granularity of events from the kernel; I once
> > tried to submit a patch upstream to address this:
> > https://lkml.org/lkml/2016/2/24/582
> > 2) Find out where exactly the memory cgroup overhead is coming from,
> > and how to reduce it or work around it to acceptable levels for
> > Android. This was also on 3.10, and maybe this has long been fixed or
> > improved in more recent kernel versions.
> >
> > I don't have cycles to work on this now, but I'm happy to talk to
> > whoever picks this up on the Android side.
> I sent some patches that is different approach. It still uses shrinkers
> but it has a kernel part that do the kill part better than the old one
> but it does it the android way. The future for this is get it triggered
> with other path's than slab shrinker. But we will not continue unless
> we get google-android to be part of it. Hocko objected heavy on
> the patches but seems not to see that we need something to
> do the job before we can disconnect from shrinker.

Yeah, I strongly believe that the chosen approach is completely wrong.
Both in abusing the shrinker interface and abusing oom_score_adj as the
only criterion for the oom victim selection.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
>> So, just for context, Android does have a userland LMK daemon (using
>> the mempressure notifiers) as you mentioned, but unfortunately I'm
>> unaware of any devices that ship with that implementation.
> I've previously worked on enabling userspace lmkd for a previous
> release, but ran into some issues there (see below).
>
>> This is reportedly because while the mempressure notifiers provide a
>> the signal to userspace, the work the deamon then has to do to look up
>> per process memory usage, in order to figure out who is best to kill
>> at that point was too costly and resulted in poor device performance.
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.
>
>> So for shipping Android devices, the LMK is still needed. However, its
>> not critical for basic android development, as the system will
>> function without it.
> It will function, but it most likely will perform horribly (as the
> page cache will be trashed to such a level that the system will be
> unusable).
>
>> Additionally I believe most vendors heavily
>> customize the LMK in their vendor tree, so the value of having it in
>> staging might be relatively low.
>>
>> It would be great however to get a discussion going here on what the
>> ulmkd needs from the kernel in order to efficiently determine who best
>> to kill, and how we might best implement that.
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582
> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.
>
> I don't have cycles to work on this now, but I'm happy to talk to
> whoever picks this up on the Android side.
I sent some patches that is different approach. It still uses shrinkers
but it has a kernel part that do the kill part better than the old one
but it does it the android way. The future for this is get it triggered
with other path's than slab shrinker. But we will not continue unless
we get google-android to be part of it. Hocko objected heavy on
the patches but seems not to see that we need something to
do the job before we can disconnect from shrinker.

> Thanks,
> Martijn
>
>> thanks
>> -john




Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
>> So, just for context, Android does have a userland LMK daemon (using
>> the mempressure notifiers) as you mentioned, but unfortunately I'm
>> unaware of any devices that ship with that implementation.
> I've previously worked on enabling userspace lmkd for a previous
> release, but ran into some issues there (see below).
>
>> This is reportedly because while the mempressure notifiers provide a
>> the signal to userspace, the work the deamon then has to do to look up
>> per process memory usage, in order to figure out who is best to kill
>> at that point was too costly and resulted in poor device performance.
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.
>
>> So for shipping Android devices, the LMK is still needed. However, its
>> not critical for basic android development, as the system will
>> function without it.
> It will function, but it most likely will perform horribly (as the
> page cache will be trashed to such a level that the system will be
> unusable).
>
>> Additionally I believe most vendors heavily
>> customize the LMK in their vendor tree, so the value of having it in
>> staging might be relatively low.
>>
>> It would be great however to get a discussion going here on what the
>> ulmkd needs from the kernel in order to efficiently determine who best
>> to kill, and how we might best implement that.
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582
> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.
>
> I don't have cycles to work on this now, but I'm happy to talk to
> whoever picks this up on the Android side.
I sent some patches that is different approach. It still uses shrinkers
but it has a kernel part that do the kill part better than the old one
but it does it the android way. The future for this is get it triggered
with other path's than slab shrinker. But we will not continue unless
we get google-android to be part of it. Hocko objected heavy on
the patches but seems not to see that we need something to
do the job before we can disconnect from shrinker.

> Thanks,
> Martijn
>
>> thanks
>> -john




Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Thu 23-02-17 12:24:57, John Stultz wrote:
> On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko  wrote:
> > From: Michal Hocko 
> >
> > Lowmemory killer is sitting in the staging tree since 2008 without any
> > serious interest for fixing issues brought up by the MM folks. The main
> > objection is that the implementation is basically broken by design:
> > - it hooks into slab shrinker API which is not suitable for this
> >   purpose. lowmem_count implementation just shows this nicely.
> >   There is no scaling based on the memory pressure and no
> >   feedback to the generic shrinker infrastructure.
> >   Moreover lowmem_scan is called way too often for the heavy
> >   work it performs.
> > - it is not reclaim context aware - no NUMA and/or memcg
> >   awareness.
> >
> > As the code stands right now it just adds a maintenance overhead when
> > core MM changes have to update lowmemorykiller.c as well. It also seems
> > that the alternative LMK implementation will be solely in the userspace
> > so this code has no perspective it seems. The staging tree is supposed
> > to be for a code which needs to be put in shape before it can be merged
> > which is not the case here obviously.
> 
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.
> 
> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.

What was the expensive part?

> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it. Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.

This is even a stronger reason to drop it from the tree. We do not want
to maintain the code which is not used in fact.
 
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.

I would really like to see this happen and, to be honest, it should have
happened quite some time ago (around the time when the lmk was merged to
the staging tree).
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Thu 23-02-17 12:24:57, John Stultz wrote:
> On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko  wrote:
> > From: Michal Hocko 
> >
> > Lowmemory killer is sitting in the staging tree since 2008 without any
> > serious interest for fixing issues brought up by the MM folks. The main
> > objection is that the implementation is basically broken by design:
> > - it hooks into slab shrinker API which is not suitable for this
> >   purpose. lowmem_count implementation just shows this nicely.
> >   There is no scaling based on the memory pressure and no
> >   feedback to the generic shrinker infrastructure.
> >   Moreover lowmem_scan is called way too often for the heavy
> >   work it performs.
> > - it is not reclaim context aware - no NUMA and/or memcg
> >   awareness.
> >
> > As the code stands right now it just adds a maintenance overhead when
> > core MM changes have to update lowmemorykiller.c as well. It also seems
> > that the alternative LMK implementation will be solely in the userspace
> > so this code has no perspective it seems. The staging tree is supposed
> > to be for a code which needs to be put in shape before it can be merged
> > which is not the case here obviously.
> 
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.
> 
> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.

What was the expensive part?

> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it. Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.

This is even a stronger reason to drop it from the tree. We do not want
to maintain the code which is not used in fact.
 
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.

I would really like to see this happen and, to be honest, it should have
happened quite some time ago (around the time when the lmk was merged to
the staging tree).
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Thu 23-02-17 21:36:00, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
[...]
> > This is reportedly because while the mempressure notifiers provide a
> > the signal to userspace, the work the deamon then has to do to look up
> > per process memory usage, in order to figure out who is best to kill
> > at that point was too costly and resulted in poor device performance.
> 
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.

I would be more than interested to hear details. We used to have some
visible charge path performance footprint but this should be gone now.

[...]
> > It would be great however to get a discussion going here on what the
> > ulmkd needs from the kernel in order to efficiently determine who best
> > to kill, and how we might best implement that.
> 
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582

Not only that, the implementation of tht vmpressure needs some serious
rethinking as well. The current one can hit critical events
unexpectedly. The calculation also doesn't consider slab reclaim
sensibly.

> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.

3e32cb2e0a12 ("mm: memcontrol: lockless page counters") has improved
situation a lot as all the charging is lockless since then (3.19).
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread Michal Hocko
On Thu 23-02-17 21:36:00, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
[...]
> > This is reportedly because while the mempressure notifiers provide a
> > the signal to userspace, the work the deamon then has to do to look up
> > per process memory usage, in order to figure out who is best to kill
> > at that point was too costly and resulted in poor device performance.
> 
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.

I would be more than interested to hear details. We used to have some
visible charge path performance footprint but this should be gone now.

[...]
> > It would be great however to get a discussion going here on what the
> > ulmkd needs from the kernel in order to efficiently determine who best
> > to kill, and how we might best implement that.
> 
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582

Not only that, the implementation of tht vmpressure needs some serious
rethinking as well. The current one can hit critical events
unexpectedly. The calculation also doesn't consider slab reclaim
sensibly.

> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.

3e32cb2e0a12 ("mm: memcontrol: lockless page counters") has improved
situation a lot as all the charging is lockless since then (3.19).
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-23 Thread Martijn Coenen
On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
>
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.

I've previously worked on enabling userspace lmkd for a previous
release, but ran into some issues there (see below).

> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.

In particular, mempressure requires memory cgroups to function, and we
saw performance regressions due to the accounting done in mem cgroups.
At the time we didn't have enough time left to solve this before the
release, and we reverted back to kernel lmkd.

>
> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it.

It will function, but it most likely will perform horribly (as the
page cache will be trashed to such a level that the system will be
unusable).

>Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.
>
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.

The two main issues I think we need to address are:
1) Getting the right granularity of events from the kernel; I once
tried to submit a patch upstream to address this:
https://lkml.org/lkml/2016/2/24/582
2) Find out where exactly the memory cgroup overhead is coming from,
and how to reduce it or work around it to acceptable levels for
Android. This was also on 3.10, and maybe this has long been fixed or
improved in more recent kernel versions.

I don't have cycles to work on this now, but I'm happy to talk to
whoever picks this up on the Android side.

Thanks,
Martijn

>
> thanks
> -john


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-23 Thread Martijn Coenen
On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
>
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.

I've previously worked on enabling userspace lmkd for a previous
release, but ran into some issues there (see below).

> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.

In particular, mempressure requires memory cgroups to function, and we
saw performance regressions due to the accounting done in mem cgroups.
At the time we didn't have enough time left to solve this before the
release, and we reverted back to kernel lmkd.

>
> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it.

It will function, but it most likely will perform horribly (as the
page cache will be trashed to such a level that the system will be
unusable).

>Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.
>
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.

The two main issues I think we need to address are:
1) Getting the right granularity of events from the kernel; I once
tried to submit a patch upstream to address this:
https://lkml.org/lkml/2016/2/24/582
2) Find out where exactly the memory cgroup overhead is coming from,
and how to reduce it or work around it to acceptable levels for
Android. This was also on 3.10, and maybe this has long been fixed or
improved in more recent kernel versions.

I don't have cycles to work on this now, but I'm happy to talk to
whoever picks this up on the Android side.

Thanks,
Martijn

>
> thanks
> -john


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-23 Thread John Stultz
On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko  wrote:
> From: Michal Hocko 
>
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
> - it hooks into slab shrinker API which is not suitable for this
>   purpose. lowmem_count implementation just shows this nicely.
>   There is no scaling based on the memory pressure and no
>   feedback to the generic shrinker infrastructure.
>   Moreover lowmem_scan is called way too often for the heavy
>   work it performs.
> - it is not reclaim context aware - no NUMA and/or memcg
>   awareness.
>
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well. It also seems
> that the alternative LMK implementation will be solely in the userspace
> so this code has no perspective it seems. The staging tree is supposed
> to be for a code which needs to be put in shape before it can be merged
> which is not the case here obviously.

So, just for context, Android does have a userland LMK daemon (using
the mempressure notifiers) as you mentioned, but unfortunately I'm
unaware of any devices that ship with that implementation.

This is reportedly because while the mempressure notifiers provide a
the signal to userspace, the work the deamon then has to do to look up
per process memory usage, in order to figure out who is best to kill
at that point was too costly and resulted in poor device performance.

So for shipping Android devices, the LMK is still needed. However, its
not critical for basic android development, as the system will
function without it. Additionally I believe most vendors heavily
customize the LMK in their vendor tree, so the value of having it in
staging might be relatively low.

It would be great however to get a discussion going here on what the
ulmkd needs from the kernel in order to efficiently determine who best
to kill, and how we might best implement that.

thanks
-john


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-23 Thread John Stultz
On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko  wrote:
> From: Michal Hocko 
>
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
> - it hooks into slab shrinker API which is not suitable for this
>   purpose. lowmem_count implementation just shows this nicely.
>   There is no scaling based on the memory pressure and no
>   feedback to the generic shrinker infrastructure.
>   Moreover lowmem_scan is called way too often for the heavy
>   work it performs.
> - it is not reclaim context aware - no NUMA and/or memcg
>   awareness.
>
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well. It also seems
> that the alternative LMK implementation will be solely in the userspace
> so this code has no perspective it seems. The staging tree is supposed
> to be for a code which needs to be put in shape before it can be merged
> which is not the case here obviously.

So, just for context, Android does have a userland LMK daemon (using
the mempressure notifiers) as you mentioned, but unfortunately I'm
unaware of any devices that ship with that implementation.

This is reportedly because while the mempressure notifiers provide a
the signal to userspace, the work the deamon then has to do to look up
per process memory usage, in order to figure out who is best to kill
at that point was too costly and resulted in poor device performance.

So for shipping Android devices, the LMK is still needed. However, its
not critical for basic android development, as the system will
function without it. Additionally I believe most vendors heavily
customize the LMK in their vendor tree, so the value of having it in
staging might be relatively low.

It would be great however to get a discussion going here on what the
ulmkd needs from the kernel in order to efficiently determine who best
to kill, and how we might best implement that.

thanks
-john


[PATCH] staging, android: remove lowmemory killer from the tree

2017-02-22 Thread Michal Hocko
From: Michal Hocko 

Lowmemory killer is sitting in the staging tree since 2008 without any
serious interest for fixing issues brought up by the MM folks. The main
objection is that the implementation is basically broken by design:
- it hooks into slab shrinker API which is not suitable for this
  purpose. lowmem_count implementation just shows this nicely.
  There is no scaling based on the memory pressure and no
  feedback to the generic shrinker infrastructure.
  Moreover lowmem_scan is called way too often for the heavy
  work it performs.
- it is not reclaim context aware - no NUMA and/or memcg
  awareness.

As the code stands right now it just adds a maintenance overhead when
core MM changes have to update lowmemorykiller.c as well. It also seems
that the alternative LMK implementation will be solely in the userspace
so this code has no perspective it seems. The staging tree is supposed
to be for a code which needs to be put in shape before it can be merged
which is not the case here obviously.

Signed-off-by: Michal Hocko 
---
 drivers/staging/android/Kconfig   |  10 --
 drivers/staging/android/Makefile  |   1 -
 drivers/staging/android/lowmemorykiller.c | 212 --
 include/linux/sched.h |   4 -
 4 files changed, 227 deletions(-)
 delete mode 100644 drivers/staging/android/lowmemorykiller.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 6c00d6f765c6..71a50b99caff 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,16 +14,6 @@ config ASHMEM
  It is, in theory, a good memory allocator for low-memory devices,
  because it can discard shared memory units when under memory pressure.
 
-config ANDROID_LOW_MEMORY_KILLER
-   bool "Android Low Memory Killer"
-   ---help---
- Registers processes to be killed when low memory conditions, this is 
useful
- as there is no particular swap space on android.
-
- The registered process will kill according to the priorities in 
android init
- scripts (/init.rc), and it defines priority values with minimum free 
memory size
- for each priority.
-
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7ed1be798909..7cf1564a49a5 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,4 +3,3 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-y  += ion/
 
 obj-$(CONFIG_ASHMEM)   += ashmem.o
-obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
deleted file mode 100644
index ec3b66561412..
--- a/drivers/staging/android/lowmemorykiller.c
+++ /dev/null
@@ -1,212 +0,0 @@
-/* drivers/misc/lowmemorykiller.c
- *
- * The lowmemorykiller driver lets user-space specify a set of memory 
thresholds
- * where processes with a range of oom_score_adj values will get killed. 
Specify
- * the minimum oom_score_adj values in
- * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
- * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
- * separated list of numbers in ascending order.
- *
- * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
- * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill
- * processes with a oom_score_adj value of 8 or higher when the free memory
- * drops below 4096 pages and kill processes with a oom_score_adj value of 0 or
- * higher when the free memory drops below 1024 pages.
- *
- * The driver considers memory used for caches to be free, but if a large
- * percentage of the cached memory is locked this can be very inaccurate
- * and processes may not get killed until the normal oom killer is triggered.
- *
- * Copyright (C) 2007-2008 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static u32 lowmem_debug_level = 1;
-static short lowmem_adj[6] = {
-   0,
-   1,
-   6,
-   12,
-};
-
-static int lowmem_adj_size = 4;
-static int lowmem_minfree[6] = 

[PATCH] staging, android: remove lowmemory killer from the tree

2017-02-22 Thread Michal Hocko
From: Michal Hocko 

Lowmemory killer is sitting in the staging tree since 2008 without any
serious interest for fixing issues brought up by the MM folks. The main
objection is that the implementation is basically broken by design:
- it hooks into slab shrinker API which is not suitable for this
  purpose. lowmem_count implementation just shows this nicely.
  There is no scaling based on the memory pressure and no
  feedback to the generic shrinker infrastructure.
  Moreover lowmem_scan is called way too often for the heavy
  work it performs.
- it is not reclaim context aware - no NUMA and/or memcg
  awareness.

As the code stands right now it just adds a maintenance overhead when
core MM changes have to update lowmemorykiller.c as well. It also seems
that the alternative LMK implementation will be solely in the userspace
so this code has no perspective it seems. The staging tree is supposed
to be for a code which needs to be put in shape before it can be merged
which is not the case here obviously.

Signed-off-by: Michal Hocko 
---
 drivers/staging/android/Kconfig   |  10 --
 drivers/staging/android/Makefile  |   1 -
 drivers/staging/android/lowmemorykiller.c | 212 --
 include/linux/sched.h |   4 -
 4 files changed, 227 deletions(-)
 delete mode 100644 drivers/staging/android/lowmemorykiller.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 6c00d6f765c6..71a50b99caff 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,16 +14,6 @@ config ASHMEM
  It is, in theory, a good memory allocator for low-memory devices,
  because it can discard shared memory units when under memory pressure.
 
-config ANDROID_LOW_MEMORY_KILLER
-   bool "Android Low Memory Killer"
-   ---help---
- Registers processes to be killed when low memory conditions, this is 
useful
- as there is no particular swap space on android.
-
- The registered process will kill according to the priorities in 
android init
- scripts (/init.rc), and it defines priority values with minimum free 
memory size
- for each priority.
-
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7ed1be798909..7cf1564a49a5 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,4 +3,3 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-y  += ion/
 
 obj-$(CONFIG_ASHMEM)   += ashmem.o
-obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
deleted file mode 100644
index ec3b66561412..
--- a/drivers/staging/android/lowmemorykiller.c
+++ /dev/null
@@ -1,212 +0,0 @@
-/* drivers/misc/lowmemorykiller.c
- *
- * The lowmemorykiller driver lets user-space specify a set of memory 
thresholds
- * where processes with a range of oom_score_adj values will get killed. 
Specify
- * the minimum oom_score_adj values in
- * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
- * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
- * separated list of numbers in ascending order.
- *
- * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
- * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill
- * processes with a oom_score_adj value of 8 or higher when the free memory
- * drops below 4096 pages and kill processes with a oom_score_adj value of 0 or
- * higher when the free memory drops below 1024 pages.
- *
- * The driver considers memory used for caches to be free, but if a large
- * percentage of the cached memory is locked this can be very inaccurate
- * and processes may not get killed until the normal oom killer is triggered.
- *
- * Copyright (C) 2007-2008 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static u32 lowmem_debug_level = 1;
-static short lowmem_adj[6] = {
-   0,
-   1,
-   6,
-   12,
-};
-
-static int lowmem_adj_size = 4;
-static int lowmem_minfree[6] = {
-   3 * 512,/* 6MB