Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

2017-04-03 Thread peter enderborg
On 03/31/2017 07:53 PM, Douglas Anderson wrote:
> Sometimes when we're out of memory the OOM killer decides to kill a
> process that's in binder_thread_read().  If we happen to be waiting
> for work we'll get the kill signal and wake up.  That's good.  ...but
> then we try to grab the binder lock before we return.  That's bad.
>
> The problem is that someone else might be holding the one true global
> binder lock.  If that one other process is blocked then we can't
> finish exiting.  In the worst case, the other process might be blocked
> waiting for memory.  In that case we'll have a really hard time
> exiting.
>
> On older kernels that don't have the OOM reaper (or something
> similar), like kernel 4.4, this is a really big problem and we end up
> with a simple deadlock because:
> * Once we pick a process to OOM kill we won't pick another--we first
>   wait for the process we picked to die.  The reasoning is that we've
>   given the doomed process access to special memory pools so it can
>   quit quickly and we don't have special pool memory to go around.
> * We don't have any type of "special access donation" that would give
>   the mutex holder our special access.
>
> On kernel 4.4 w/ binder patches, we easily see this happen:
>
> We just attempted this OOM kill:
>   Killed process 4132 (Binder_1) total-vm:949904kB, anon-rss:4kB, file-rss:0kB
>
> The doomed task:
>   Stack traceback for pid 4132
>4132 3380  00   D  Binder_1
>   Call trace:
>__switch_to+0x9c/0xa8
>__schedule+0x504/0x740
>schedule+0x88/0xa8
>schedule_preempt_disabled+0x28/0x44
>__mutex_lock_slowpath+0xf8/0x1a4
>mutex_lock+0x4c/0x68
>binder_thread_read+0x68c/0x11bc
>binder_ioctl_write_read.constprop.46+0x1e8/0x318
>binder_ioctl+0x370/0x778
>compat_SyS_ioctl+0x134/0x10ac
>el0_svc_naked+0x24/0x28
>
> The binder lock holder:
>4001 3380  04   RBinder_7
>   Call trace:
>__switch_to+0x9c/0xa8
>__schedule+0x504/0x740
>preempt_schedule_common+0x28/0x48
>preempt_schedule+0x24/0x2c
>_raw_spin_unlock+0x44/0x50
>list_lru_count_one+0x40/0x50
>super_cache_count+0x68/0xa0
>shrink_slab.part.54+0xfc/0x4a4
>shrink_zone+0xa8/0x198
>try_to_free_pages+0x408/0x590
>__alloc_pages_nodemask+0x60c/0x95c
>__read_swap_cache_async+0x98/0x214
>read_swap_cache_async+0x48/0x7c
>swapin_readahead+0x188/0x1b8
>handle_mm_fault+0xbf8/0x1050
>do_page_fault+0x140/0x2dc
>do_mem_abort+0x64/0xd8
>   Exception stack: < ... cut ... >
>el1_da+0x18/0x78
>binder_ioctl+0x370/0x778
>compat_SyS_ioctl+0x134/0x10ac
>el0_svc_naked+0x24/0x28
>
> There's really not a lot of reason to grab the binder lock when we're
> just going to exit anyway.  Add a little bit of complexity to the code
> to allow binder_thread_read() to sometimes return without the lock
> held.
>
> NOTE: to do this safely we need to make sure that we can safely do
> these things _without_ the global binder lock:
> * Muck with proc->ready_threads
> * Clear a bitfield in thread->looper
>
> It appears that those two operations don't need to be done together
> and it's OK to have different locking solutions for the two.  Thus:
>
> 1. We change thread->looper to atomic_t and use atomic ops to muck
>with it.  This means we're not clobbering someone else's work with
>our read/modify/write.
>
>Note that I haven't confirmed that every modify of "thread->looper"
>can be done without the binder mutex or without some
>coarser-grained lock.  ...but clearing the
>BINDER_LOOPER_STATE_WAITING bit should be fine.  The only place
>looking at it is binder_deferred_flush().  Also: while erroring out
>we also may clear BINDER_LOOPER_STATE_NEED_RETURN.  This looks to
>be OK, though the code isn't trivial to follow.
>
> 2. We add a new fine-grained mutex (per "proc" structure) to guard all
>of the "_threads" counters.  Technically the only value we're
>modifying is "proc->ready_threads" and we're just decrementing it
>(so maybe we could use atomic_t here, too?).  ...but it appears
>that all of the "_threads" work together so protecting them
>together seems to make the most sense.
>
>Note that to avoid deadlock:
>* We never first grab the fine grained mutex and _then_ the binder
>  mutex.
>* We always grab the fine grained mutex for short periods and never
>  do anything blocking while holding it.
>
> To sum it all up:
>
> 1. This patch does improve the chances that we will be able to kill a
>task quickly when we want to.  That means this patch has some
>merit.
> 2. Though this patch has merit, it isn't isn't actually all that
>critical because:
>2a) On newer kernels the OOM reaper should keep us from getting
>deadlocked.
>2b) Even on old kernels there are other deadlock cases we hit even
>if we fix binder in this way.
>
> Signed-off-by: Douglas Anderson 
> ---
> It's unclear t

Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 04:03 PM, Michal Hocko wrote:
> On Fri 24-02-17 15:42:49, peter enderborg wrote:
>> On 02/24/2017 03:11 PM, Michal Hocko wrote:
>>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>>>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
>>> [...]
>>>>> Yeah, I strongly believe that the chosen approach is completely wrong.
>>>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
>>>>> only criterion for the oom victim selection.
>>>> No one is arguing that shrinker is not problematic. And would be great
>>>> if it is removed from lmk.  The oom_score_adj is the way user-space
>>>> tells the kernel what the user-space has as prio. And android is using
>>>> that very much. It's a core part.
>>> Is there any documentation which describes how this is done?
>>>
>>>> I have never seen it be used on
>>>> other linux system so what is the intended usage of oom_score_adj? Is
>>>> this really abusing?
>>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
>>> criterion on its own, well, except for the extreme sides of the range
>>> which are defined to enforce resp. disallow selecting the task. The
>>> global oom killer calculates the oom score as a function of the memory
>>> consumption. Your patch simply ignores the memory consumption (and uses
>>> pids to sort tasks with the same oom score which is just mind boggling)
>> How much it uses is of very little importance for android.
> But it is relevant for the global oom killer which is the main consumer of
> the oom_score_adj.
>
>> The score
>> used are only for apps and their services. System related are not
>> touched by android lmk. The pid is only to have a unique key to be
>> able to have it fast within a rbtree.  One idea was to use task_pid to
>> get a strict age of process to get a round robin but since it does not
>> matter i skipped that idea since it does not matter.
> Pid will not tell you anything about the age. Pids do wrap around.
>
>>> and that is what I call the abuse. The oom score calculation might
>>> change in future, of course, but all consumers of the oom_score_adj
>>> really have to agree on the base which is adjusted by this tunable
>>> otherwise you can see a lot of unexpected behavior.
>> Then can we just define a range that is strictly for user-space?
> This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> is usable.
So we use them in userspace and kernel space but where is the abuse then?
>>> I would even argue that nobody outside of mm/oom_kill.c should really
>>> have any business with this tunable.  You can of course tweak the value
>>> from the userspace and help to chose a better oom victim this way but
>>> that is it.
>> Why only help? If userspace can give an exact order to kernel that
>> must be a good thing; other wise kernel have to guess and when
>> can that be better? 
> Because userspace doesn't know who is the best victim in 99% cases.
If user-space does not tell kernel what to it have to guess, android
user-space does, and maybe other should too.
> Android might be different, although, I am a bit skeptical - especially
> after hearing quite some complains about random application being
> killed... If you do believe that you know better then, by all means,
> implement your custom user space LMK and chose the oom victim on a
> different basis but try to understand that the global OOM killer is the
> last resort measure to make the system usable again. There is a good
> reason why the kernel uses the current badness calculation. The previous
> implementation which considered the process age ad other things was just
> too random to have a understandable behavior.
I think it make sense that there is only one way to describe what is
important what is not. And oom_kill is the last resort is one problem
for android. Android lowmemorykiller balance memory usage and
tries to be more proactive and that is why shrinkers work so well.
> In any case playing nasty games with the oom killer tunables might and
> will lead, well, to unexpected behavior.

I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
we then be "safe"?

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 03:11 PM, Michal Hocko wrote:
> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> [...]
>>> Yeah, I strongly believe that the chosen approach is completely wrong.
>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
>>> only criterion for the oom victim selection.
>> No one is arguing that shrinker is not problematic. And would be great
>> if it is removed from lmk.  The oom_score_adj is the way user-space
>> tells the kernel what the user-space has as prio. And android is using
>> that very much. It's a core part.
> Is there any documentation which describes how this is done?
>
>> I have never seen it be used on
>> other linux system so what is the intended usage of oom_score_adj? Is
>> this really abusing?
> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> criterion on its own, well, except for the extreme sides of the range
> which are defined to enforce resp. disallow selecting the task. The
> global oom killer calculates the oom score as a function of the memory
> consumption. Your patch simply ignores the memory consumption (and uses
> pids to sort tasks with the same oom score which is just mind boggling)
How much it uses is of very little importance for android. The score
used are only for apps and their services. System related are not touched by
android lmk. The pid is only to have a unique key to be able to have it fast 
within a rbtree.
One idea was to use task_pid to get a strict age of process to get a round robin
but since it does not matter i skipped that idea since it does not matter.
> and that is what I call the abuse. The oom score calculation might
> change in future, of course, but all consumers of the oom_score_adj
> really have to agree on the base which is adjusted by this tunable
> otherwise you can see a lot of unexpected behavior.
Then can we just define a range that is strictly for user-space?
> I would even argue that nobody outside of mm/oom_kill.c should really
> have any business with this tunable.  You can of course tweak the value
> from the userspace and help to chose a better oom victim this way but
> that is it.
Why only help? If userspace can give an exact order to kernel that
must be a good thing; other wise kernel have to guess and when
can that be better? 
> Anyway, I guess we are getting quite off-topic here.
>

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/24/2017 01:28 PM, Michal Hocko wrote:
> On Fri 24-02-17 13:19:46, peter enderborg wrote:
>> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
>>> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
>>>> So, just for context, Android does have a userland LMK daemon (using
>>>> the mempressure notifiers) as you mentioned, but unfortunately I'm
>>>> unaware of any devices that ship with that implementation.
>>> I've previously worked on enabling userspace lmkd for a previous
>>> release, but ran into some issues there (see below).
>>>
>>>> This is reportedly because while the mempressure notifiers provide a
>>>> the signal to userspace, the work the deamon then has to do to look up
>>>> per process memory usage, in order to figure out who is best to kill
>>>> at that point was too costly and resulted in poor device performance.
>>> In particular, mempressure requires memory cgroups to function, and we
>>> saw performance regressions due to the accounting done in mem cgroups.
>>> At the time we didn't have enough time left to solve this before the
>>> release, and we reverted back to kernel lmkd.
>>>
>>>> So for shipping Android devices, the LMK is still needed. However, its
>>>> not critical for basic android development, as the system will
>>>> function without it.
>>> It will function, but it most likely will perform horribly (as the
>>> page cache will be trashed to such a level that the system will be
>>> unusable).
>>>
>>>> Additionally I believe most vendors heavily
>>>> customize the LMK in their vendor tree, so the value of having it in
>>>> staging might be relatively low.
>>>>
>>>> It would be great however to get a discussion going here on what the
>>>> ulmkd needs from the kernel in order to efficiently determine who best
>>>> to kill, and how we might best implement that.
>>> The two main issues I think we need to address are:
>>> 1) Getting the right granularity of events from the kernel; I once
>>> tried to submit a patch upstream to address this:
>>> https://lkml.org/lkml/2016/2/24/582
>>> 2) Find out where exactly the memory cgroup overhead is coming from,
>>> and how to reduce it or work around it to acceptable levels for
>>> Android. This was also on 3.10, and maybe this has long been fixed or
>>> improved in more recent kernel versions.
>>>
>>> I don't have cycles to work on this now, but I'm happy to talk to
>>> whoever picks this up on the Android side.
>> I sent some patches that is different approach. It still uses shrinkers
>> but it has a kernel part that do the kill part better than the old one
>> but it does it the android way. The future for this is get it triggered
>> with other path's than slab shrinker. But we will not continue unless
>> we get google-android to be part of it. Hocko objected heavy on
>> the patches but seems not to see that we need something to
>> do the job before we can disconnect from shrinker.
> Yeah, I strongly believe that the chosen approach is completely wrong.
> Both in abusing the shrinker interface and abusing oom_score_adj as the
> only criterion for the oom victim selection.

No one is arguing that shrinker is not problematic. And would be great if it is 
removed from lmk.
The oom_score_adj is the way user-space tells the kernel what the user-space 
has as prio. And android
is using that very much. It's a core part. I have never seen it be used on other
linux system so what is the intended usage of oom_score_adj? Is this really 
abusing?

I think I can help out with removing  shrinker from lmk. Not using 
oom_score_adj is harder and
has a bigger impact on android, except the trivial solution by adding 
replacement
oom_user_prio and use that within android and kernel.


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] staging, android: remove lowmemory killer from the tree

2017-02-24 Thread peter enderborg
On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz  wrote:
>> So, just for context, Android does have a userland LMK daemon (using
>> the mempressure notifiers) as you mentioned, but unfortunately I'm
>> unaware of any devices that ship with that implementation.
> I've previously worked on enabling userspace lmkd for a previous
> release, but ran into some issues there (see below).
>
>> This is reportedly because while the mempressure notifiers provide a
>> the signal to userspace, the work the deamon then has to do to look up
>> per process memory usage, in order to figure out who is best to kill
>> at that point was too costly and resulted in poor device performance.
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.
>
>> So for shipping Android devices, the LMK is still needed. However, its
>> not critical for basic android development, as the system will
>> function without it.
> It will function, but it most likely will perform horribly (as the
> page cache will be trashed to such a level that the system will be
> unusable).
>
>> Additionally I believe most vendors heavily
>> customize the LMK in their vendor tree, so the value of having it in
>> staging might be relatively low.
>>
>> It would be great however to get a discussion going here on what the
>> ulmkd needs from the kernel in order to efficiently determine who best
>> to kill, and how we might best implement that.
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582
> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.
>
> I don't have cycles to work on this now, but I'm happy to talk to
> whoever picks this up on the Android side.
I sent some patches that is different approach. It still uses shrinkers
but it has a kernel part that do the kill part better than the old one
but it does it the android way. The future for this is get it triggered
with other path's than slab shrinker. But we will not continue unless
we get google-android to be part of it. Hocko objected heavy on
the patches but seems not to see that we need something to
do the job before we can disconnect from shrinker.

> Thanks,
> Martijn
>
>> thanks
>> -john


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-15 Thread peter enderborg
On 02/14/2017 05:51 PM, Greg KH wrote:
> On Tue, Feb 14, 2017 at 05:09:30PM +0100, peter.enderb...@sonymobile.com 
> wrote:
>> From: Peter Enderborg 
>>
>> This collects stats for shrinker calls and how much
>> waste work we do within the lowmemorykiller.
>>
>> Signed-off-by: Peter Enderborg 
> Wait, what changed from the previous versions of this patch?  Did you
> take the review comments into consideration, or is this just a resend of
> the original patches in a format that isn't corrupted?
>
> thanks,
>
> greg k-h

This is just a send with git-send-email that seems to work better. Nothing
else than tab-spaces should be different. I would like to have some positive
feedback from google/android before I start to send updated patches to the list.
If google are ready for the userspace solution this patch set is pointless for
upstream kernel.

Michal Hocko is very negative to hole thing, but we have addressed at least some
issues he pointed out on the list in 2015. Is there any idea to continue?

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-15 Thread peter enderborg
On 02/14/2017 05:50 PM, Greg KH wrote:
> On Tue, Feb 14, 2017 at 05:09:30PM +0100, peter.enderb...@sonymobile.com 
> wrote:
>> From: Peter Enderborg 
>>
>> This collects stats for shrinker calls and how much
>> waste work we do within the lowmemorykiller.
>>
>> Signed-off-by: Peter Enderborg 
>> ---
>>  drivers/staging/android/Kconfig | 11 
>>  drivers/staging/android/Makefile|  1 +
>>  drivers/staging/android/lowmemorykiller.c   |  9 ++-
>>  drivers/staging/android/lowmemorykiller_stats.c | 85 
>> +
>>  drivers/staging/android/lowmemorykiller_stats.h | 29 +
>>  5 files changed, 134 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/staging/android/lowmemorykiller_stats.c
>>  create mode 100644 drivers/staging/android/lowmemorykiller_stats.h
>>
>> diff --git a/drivers/staging/android/Kconfig 
>> b/drivers/staging/android/Kconfig
>> index 6c00d6f..96e86c7 100644
>> --- a/drivers/staging/android/Kconfig
>> +++ b/drivers/staging/android/Kconfig
>> @@ -24,6 +24,17 @@ config ANDROID_LOW_MEMORY_KILLER
>>scripts (/init.rc), and it defines priority values with minimum free 
>> memory size
>>for each priority.
>>  
>> +config ANDROID_LOW_MEMORY_KILLER_STATS
>> +bool "Android Low Memory Killer: collect statistics"
>> +depends on ANDROID_LOW_MEMORY_KILLER
>> +default n
>> +help
>> +  Create a file in /proc/lmkstats that includes
>> +  collected statistics about kills, scans and counts
>> +  and  interaction with the shrinker. Its content
>> +  will be different depeding on lmk implementation used.
> Ick, no new /proc files please, this isn't a "process" value.  What's
> wrong with debugfs?
This is intended for android. Android users are very limited in their access
to linux part of the system on commercial models and lmk activity has a bad 
impact on the performance
of the device. Even the application developers has not much access so it seems 
to be fair to give
the users the information about why there is a problem.
> Also note the minor '  ' usage in your first sentence of the help text.
>
>>  source "drivers/staging/android/ion/Kconfig"
>>  
>>  endif # if ANDROID
>> diff --git a/drivers/staging/android/Makefile 
>> b/drivers/staging/android/Makefile
>> index 7ed1be7..d710eb2 100644
>> --- a/drivers/staging/android/Makefile
>> +++ b/drivers/staging/android/Makefile
>> @@ -4,3 +4,4 @@ obj-y+= ion/
>>  
>>  obj-$(CONFIG_ASHMEM)+= ashmem.o
>>  obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER) += lowmemorykiller.o
>> +obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER_STATS)   += 
>> lowmemorykiller_stats.o
>> diff --git a/drivers/staging/android/lowmemorykiller.c 
>> b/drivers/staging/android/lowmemorykiller.c
>> index ec3b665..15c1b38 100644
>> --- a/drivers/staging/android/lowmemorykiller.c
>> +++ b/drivers/staging/android/lowmemorykiller.c
>> @@ -42,6 +42,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include "lowmemorykiller_stats.h"
>>  
>>  static u32 lowmem_debug_level = 1;
>>  static short lowmem_adj[6] = {
>> @@ -72,6 +73,7 @@ static unsigned long lowmem_deathpending_timeout;
>>  static unsigned long lowmem_count(struct shrinker *s,
>>struct shrink_control *sc)
>>  {
>> +lmk_inc_stats(LMK_COUNT);
>>  return global_node_page_state(NR_ACTIVE_ANON) +
>>  global_node_page_state(NR_ACTIVE_FILE) +
>>  global_node_page_state(NR_INACTIVE_ANON) +
>> @@ -95,6 +97,7 @@ static unsigned long lowmem_scan(struct shrinker *s, 
>> struct shrink_control *sc)
>>  global_node_page_state(NR_SHMEM) -
>>  total_swapcache_pages();
>>  
>> +lmk_inc_stats(LMK_SCAN);
>>  if (lowmem_adj_size < array_size)
>>  array_size = lowmem_adj_size;
>>  if (lowmem_minfree_size < array_size)
>> @@ -134,6 +137,7 @@ static unsigned long lowmem_scan(struct shrinker *s, 
>> struct shrink_control *sc)
>>  if (task_lmk_waiting(p) &&
>>  time_before_eq(jiffies, lowmem_deathpending_timeout)) {
>>  task_unlock(p);
>> +lmk_inc_stats(LMK_TIMEOUT);
>>  rcu_read_unlock();
>>  return 0;
>>   

Re: [PATCH 0/3 staging-next] android: Lowmemmorykiller task tree

2017-02-13 Thread peter enderborg
On 02/10/2017 11:27 AM, Michal Hocko wrote:
> [I have only now see this cover - it answers some of the questions I've
>  had to specific patches. It would be really great if you could use git
>  send-email to post patch series - it just does the right thing(tm)]
>
> On Thu 09-02-17 14:21:40, peter enderborg wrote:
>> Lowmemorykiller efficiency problem and a solution.
>>
>> Lowmemorykiller in android has a severe efficiency problem. The basic
>> problem is that the registered shrinker gets called very often without
>>  anything actually happening.
> Which is an inherent problem because lkml doesn't belong to shrinkers
> infrastructure.

Not really what this patch address.  I see it as a problem with shrinker
that there no slow-path-free (scan/count) where it should belong.
This patch address a specific problem where lot of cpu are wasted
in low memory conditions.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-13 Thread peter enderborg
On 02/10/2017 10:15 AM, Michal Hocko wrote:
> On Fri 10-02-17 10:05:34, peter enderborg wrote:
>> On 02/10/2017 08:59 AM, Michal Hocko wrote:
> [...]
>>> The approach was wrong from the day 1. Abusing slab shrinkers
>>> is just a bad place to stick this logic. This all belongs to the
>>> userspace.
>> But now it is there and we have to stick with it.
> It is also adding maintenance cost. Just have a look at the git log and
> check how many patches were just a result of the core changes which
> needed a sync.
>
> I seriously doubt that any of the android devices can run natively on
> the Vanilla kernel so insisting on keeping this code in staging doesn't
> give much sense to me.

I guess that we more than a few that would like to see that.

We have

http://developer.sonymobile.com/open-devices/how-to-build-and-flash-a-linux-kernel/how-to-build-mainline-linux-for-xperia-devices/

It is not the latest on anything and it is not on par with commercial bundled 
software.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-10 Thread peter enderborg
On 02/10/2017 08:59 AM, Michal Hocko wrote:
> On Fri 10-02-17 08:51:49, Greg KH wrote:
>> On Fri, Feb 10, 2017 at 08:21:32AM +0100, peter enderborg wrote:
> [...]
>>> Until then we have to polish this version as good as we can. It is
>>> essential for android as it is now.
>> But if no one is willing to do the work to fix the reported issues, why
>> should it remain?  Can you do the work here?  You're already working on
>> fixing some of the issues in a differnt way, why not do the "real work"
>> here instead for everyone to benifit from?
> Well, to be honest, I do not think that the current code is easily
> fixable. 
This patch improves the current situation and address some
of the issues that makes android devices behaviour different
than other linux systems.
> The approach was wrong from the day 1. Abusing slab shrinkers
> is just a bad place to stick this logic. This all belongs to the
> userspace.
But now it is there and we have to stick with it.
>  For that we need a proper mm pressure notification which is
> supposed to be vmpressure but that one also doesn't seem to work all
> that great. So rather than trying to fix unfixable I would stronly
> suggest focusing on making vmpressure work reliably.


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-10 Thread peter enderborg
On 02/10/2017 08:51 AM, Greg Kroah-Hartman wrote:
> On Fri, Feb 10, 2017 at 08:21:32AM +0100, peter enderborg wrote:
>> Im not speaking for google, but I think there is a work ongoing to
>> replace this with user-space code.
> Really?  I have not heard this at all, any pointers to whom in Google is
> doing it?
>
I think it was mention some of the google conferences. The idea
is the lmkd that uses memory pressure events to trigger this.
>From git log in lmkd i think Colin Cross is involved.

>> Until then we have to polish this version as good as we can. It is
>> essential for android as it is now.
> But if no one is willing to do the work to fix the reported issues, why
> should it remain? 
It is needed by billions of phones.
>  Can you do the work here? 
No. Change the kernel is only one small part of the solution.
>  You're already working on
> fixing some of the issues in a differnt way, why not do the "real work"
> here instead for everyone to benifit from?
The long term solution is something from AOSP.  As you know
we tried to contribute this to AOSP.  As OEM we can't turn android
upside down.  It has to be a step by step.
> thanks,
>
> greg k-h
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 3/3 staging-next] mm: Remove RCU and tasklocks from lmk

2017-02-09 Thread peter enderborg
On 02/09/2017 09:05 PM, Michal Hocko wrote:
> On Thu 09-02-17 14:21:52, peter enderborg wrote:
>> Fundamental changes:
>> 1 Does NOT take any RCU lock in shrinker functions.
>> 2 It returns same result for scan and counts, so  we dont need to do
>>   shinker will know when it is pointless to call scan.
>> 3 It does not lock any other process than the one that is
>>   going to be killed.
>>
>> Background.
>> The low memory killer scans for process that can be killed to free
>> memory. This can be cpu consuming when there is a high demand for
>> memory. This can be seen by analysing the kswapd0 task work.
>> The stats function added in earler patch adds a counter for waste work.
>>
>> How it works.
>> This patch create a structure within the lowmemory killer that caches
>> the user spaces processes that it might kill. It is done with a
>> sorted rbtree so we can very easy find the candidate to be killed,
>> and knows its properies as memory usage and sorted by oom_score_adj
>> to look up the task with highest oom_score_adj. To be able to achive
>> this it uses oom_score_notify events.
>>
>> This patch also as a other effect, we are now free to do other
>> lowmemorykiller configurations.  Without the patch there is a need
>> for a tradeoff between freed memory and task and rcu locks. This
>> is no longer a concern for tuning lmk. This patch is not intended
>> to do any calculation changes other than we do use the cache for
>> calculate the count values and that makes kswapd0 to shrink other
>> areas.
> I have to admit I really do not understand big part of the above
> paragraph as well as how this all is supposed to work. A quick glance
> over the implementation. __lmk_task_insert seems to be only called from
> the oom_score notifier context. If nobody updates the value then no task
> will get into the tree. Or am I missing something really obvious here?
> Moreover oom scores tend to be mostly same for tasks. That means that
> your sorted tree will become sorted by pids in most cases. I do not see
> any sorting based on the rss nor any updates that would reflect updates
> of rss. How can this possibly work?

The task tree nodes are created,updated or removed from the notifier when
there is a relevant oom_score_adj change. If no one create a task that
is in the range for the lowmemorykiller the tree will be empty. This is
an android feature so the score will be updated very often. It is
part of activity manager to prioritise tasks.  Why should we do sort of
rss?


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread peter enderborg
Im not speaking for google, but I think there is a work ongoing to
replace this with user-space code. Until then we have to polish
this version as good as we can. It is essential for android as it is now.

On 02/09/2017 09:54 PM, Michal Hocko wrote:
> On Thu 09-02-17 21:07:37, Greg KH wrote:
>> On Thu, Feb 09, 2017 at 08:26:41PM +0100, Michal Hocko wrote:
>>> On Thu 09-02-17 14:21:45, peter enderborg wrote:
>>>> This collects stats for shrinker calls and how much
>>>> waste work we do within the lowmemorykiller.
>>> This doesn't explain why do we need this information and who is going to
>>> use it. Not to mention it exports it in /proc which is considered a
>>> stable user API. This is a no-go, especially for something that is still
>>> lingering in the staging tree without any actuall effort to make it
>>> fully supported MM feature. I am actually strongly inclined to simply
>>> drop lmk from the tree completely.
>> I thought that someone was working to get the "native" mm features to
>> work properly with the lmk "feature"  Do you recall if that work got
>> rejected, or just never happened?
> Never happened AFAIR. There were some attempts to tune the current
> behavior which has been rejected for one reason or another but I am not
> really aware of anybody working on moving the code from staging area.
>
> I already have this in the to-send queue, just didn't get to post it yet
> because I planned to polish the reasoning some more.
> ---
> From 9f871b54a387e0a7cdfaf0fa256d1440093e427c Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 1 Feb 2017 10:37:30 +0100
> Subject: [PATCH] staging, android: remove lowmemory killer from the tree
>
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
>   - it hooks into slab shrinker API which is not suitable for this
> purpose. lowmem_count implementation just shows this nicely.
> There is no scaling based on the memory pressure and no
> feedback to the generic shrinker infrastructure.
>   - it is not reclaim context aware - no NUMA and/or memcg
> awareness.
>
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well.
>
> Signed-off-by: Michal Hocko 
> ---
>  drivers/staging/android/Kconfig   |  10 --
>  drivers/staging/android/Makefile  |   1 -
>  drivers/staging/android/lowmemorykiller.c | 212 
> --
>  include/linux/sched.h |   4 -
>  4 files changed, 227 deletions(-)
>  delete mode 100644 drivers/staging/android/lowmemorykiller.c
>
> diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
> index 6c00d6f765c6..71a50b99caff 100644
> --- a/drivers/staging/android/Kconfig
> +++ b/drivers/staging/android/Kconfig
> @@ -14,16 +14,6 @@ config ASHMEM
> It is, in theory, a good memory allocator for low-memory devices,
> because it can discard shared memory units when under memory pressure.
>  
> -config ANDROID_LOW_MEMORY_KILLER
> - bool "Android Low Memory Killer"
> - ---help---
> -   Registers processes to be killed when low memory conditions, this is 
> useful
> -   as there is no particular swap space on android.
> -
> -   The registered process will kill according to the priorities in 
> android init
> -   scripts (/init.rc), and it defines priority values with minimum free 
> memory size
> -   for each priority.
> -
>  source "drivers/staging/android/ion/Kconfig"
>  
>  endif # if ANDROID
> diff --git a/drivers/staging/android/Makefile 
> b/drivers/staging/android/Makefile
> index 7ed1be798909..7cf1564a49a5 100644
> --- a/drivers/staging/android/Makefile
> +++ b/drivers/staging/android/Makefile
> @@ -3,4 +3,3 @@ ccflags-y += -I$(src) # needed for trace 
> events
>  obj-y+= ion/
>  
>  obj-$(CONFIG_ASHMEM) += ashmem.o
> -obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)  += lowmemorykiller.o
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> deleted file mode 100644
> index ec3b66561412..
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ /dev/null
> @@ -1,212 +0,0 @@
> -/* drivers/misc/lowmemorykiller.c
> - *
> - * The lowmemorykiller driver lets user-space specify a set of memory 
> thresholds
> - * where proc

Re: [PATCH 1/3 v2 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread peter enderborg
On 02/09/2017 09:13 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 09, 2017 at 04:42:35PM +0100, peter enderborg wrote:
>> This collects stats for shrinker calls and how much
>> waste work we do within the lowmemorykiller.
>>
>> Signed-off-by: Peter Enderborg 
>> ---
>>  drivers/staging/android/Kconfig | 11 
>>  drivers/staging/android/Makefile|  1 +
>>  drivers/staging/android/lowmemorykiller.c   |  9 ++-
>>  drivers/staging/android/lowmemorykiller_stats.c | 85 
>> +
>>  drivers/staging/android/lowmemorykiller_stats.h | 29 +
>>  5 files changed, 134 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/staging/android/lowmemorykiller_stats.c
>>  create mode 100644 drivers/staging/android/lowmemorykiller_stats.h
> What changed from v1?
Nothing. I thought I found the reason why my tabs are replaced by spaces in 
transport.

>> @@ -72,6 +73,7 @@ static unsigned long lowmem_deathpending_timeout;
>>  static unsigned long lowmem_count(struct shrinker *s,
>>struct shrink_control *sc)
>>  {
>> +lmk_inc_stats(LMK_COUNT);
>>  return global_node_page_state(NR_ACTIVE_ANON) +
>>  global_node_page_state(NR_ACTIVE_FILE) +
>>  global_node_page_state(NR_INACTIVE_ANON) +
> Your email client is eating tabs and spitting out spaces, making this
> impossible to even consider being merged :(
>
> Please fix your email client, documentation for how to do so is in the
> kernel Documentation directory.
>
> thanks,
>
> greg k-h


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 1/3 v2 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread peter enderborg
This collects stats for shrinker calls and how much
waste work we do within the lowmemorykiller.

Signed-off-by: Peter Enderborg 
---
 drivers/staging/android/Kconfig | 11 
 drivers/staging/android/Makefile|  1 +
 drivers/staging/android/lowmemorykiller.c   |  9 ++-
 drivers/staging/android/lowmemorykiller_stats.c | 85 +
 drivers/staging/android/lowmemorykiller_stats.h | 29 +
 5 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/android/lowmemorykiller_stats.c
 create mode 100644 drivers/staging/android/lowmemorykiller_stats.h

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 6c00d6f..96e86c7 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -24,6 +24,17 @@ config ANDROID_LOW_MEMORY_KILLER
   scripts (/init.rc), and it defines priority values with minimum free 
memory size
   for each priority.
 
+config ANDROID_LOW_MEMORY_KILLER_STATS
+bool "Android Low Memory Killer: collect statistics"
+depends on ANDROID_LOW_MEMORY_KILLER
+default n
+help
+  Create a file in /proc/lmkstats that includes
+  collected statistics about kills, scans and counts
+  and  interaction with the shrinker. Its content
+  will be different depeding on lmk implementation used.
+
+
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7ed1be7..d710eb2 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -4,3 +4,4 @@ obj-y+= ion/
 
 obj-$(CONFIG_ASHMEM)+= ashmem.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
+obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER_STATS)+= lowmemorykiller_stats.o
diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
index ec3b665..15c1b38 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include "lowmemorykiller_stats.h"
 
 static u32 lowmem_debug_level = 1;
 static short lowmem_adj[6] = {
@@ -72,6 +73,7 @@ static unsigned long lowmem_deathpending_timeout;
 static unsigned long lowmem_count(struct shrinker *s,
   struct shrink_control *sc)
 {
+lmk_inc_stats(LMK_COUNT);
 return global_node_page_state(NR_ACTIVE_ANON) +
 global_node_page_state(NR_ACTIVE_FILE) +
 global_node_page_state(NR_INACTIVE_ANON) +
@@ -95,6 +97,7 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
 global_node_page_state(NR_SHMEM) -
 total_swapcache_pages();
 
+lmk_inc_stats(LMK_SCAN);
 if (lowmem_adj_size < array_size)
 array_size = lowmem_adj_size;
 if (lowmem_minfree_size < array_size)
@@ -134,6 +137,7 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
 if (task_lmk_waiting(p) &&
 time_before_eq(jiffies, lowmem_deathpending_timeout)) {
 task_unlock(p);
+lmk_inc_stats(LMK_TIMEOUT);
 rcu_read_unlock();
 return 0;
 }
@@ -179,7 +183,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
  other_free * (long)(PAGE_SIZE / 1024));
 lowmem_deathpending_timeout = jiffies + HZ;
 rem += selected_tasksize;
-}
+lmk_inc_stats(LMK_KILL);
+} else
+lmk_inc_stats(LMK_WASTE);
 
 lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
  sc->nr_to_scan, sc->gfp_mask, rem);
@@ -196,6 +202,7 @@ static struct shrinker lowmem_shrinker = {
 static int __init lowmem_init(void)
 {
 register_shrinker(&lowmem_shrinker);
+init_procfs_lmk();
 return 0;
 }
 device_initcall(lowmem_init);
diff --git a/drivers/staging/android/lowmemorykiller_stats.c 
b/drivers/staging/android/lowmemorykiller_stats.c
new file mode 100644
index 000..673691c
--- /dev/null
+++ b/drivers/staging/android/lowmemorykiller_stats.c
@@ -0,0 +1,85 @@
+/*
+ *  lowmemorykiller_stats
+ *
+ *  Copyright (C) 2017 Sony Mobile Communications Inc.
+ *
+ *  Author: Peter Enderborg 
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+/* This code is bookkeeping of statistical information
+ * from lowmemorykiller and provide a node in proc "/proc/lmkstats".
+ */
+
+#include 
+#include 
+#include "lowmemorykiller_stats.h"
+
+struct lmk_stats {
+atomic_long_t scans; /* counter as in shrinker scans */
+atomic_long_t kills; /* the number of sigkills sent */
+atomic_long_t waste

[PATCH 3/3 staging-next] mm: Remove RCU and tasklocks from lmk

2017-02-09 Thread peter enderborg

Fundamental changes:
1 Does NOT take any RCU lock in shrinker functions.
2 It returns same result for scan and counts, so  we dont need to do
  shinker will know when it is pointless to call scan.
3 It does not lock any other process than the one that is
  going to be killed.

Background.
The low memory killer scans for process that can be killed to free
memory. This can be cpu consuming when there is a high demand for
memory. This can be seen by analysing the kswapd0 task work.
The stats function added in earler patch adds a counter for waste work.

How it works.
This patch create a structure within the lowmemory killer that caches
the user spaces processes that it might kill. It is done with a
sorted rbtree so we can very easy find the candidate to be killed,
and knows its properies as memory usage and sorted by oom_score_adj
to look up the task with highest oom_score_adj. To be able to achive
this it uses oom_score_notify events.

This patch also as a other effect, we are now free to do other
lowmemorykiller configurations.  Without the patch there is a need
for a tradeoff between freed memory and task and rcu locks. This
is no longer a concern for tuning lmk. This patch is not intended
to do any calculation changes other than we do use the cache for
calculate the count values and that makes kswapd0 to shrink other
areas.

Signed-off-by: Peter Enderborg 
---
 drivers/staging/android/Kconfig |   1 +
 drivers/staging/android/Makefile|   1 +
 drivers/staging/android/lowmemorykiller.c   | 294 +++-
 drivers/staging/android/lowmemorykiller.h   |  15 ++
 drivers/staging/android/lowmemorykiller_stats.c |  24 ++
 drivers/staging/android/lowmemorykiller_stats.h |  14 +-
 drivers/staging/android/lowmemorykiller_tasks.c | 220 ++
 drivers/staging/android/lowmemorykiller_tasks.h |  35 +++
 8 files changed, 498 insertions(+), 106 deletions(-)
 create mode 100644 drivers/staging/android/lowmemorykiller.h
 create mode 100644 drivers/staging/android/lowmemorykiller_tasks.c
 create mode 100644 drivers/staging/android/lowmemorykiller_tasks.h

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 96e86c7..899186c 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -16,6 +16,7 @@ config ASHMEM

 config ANDROID_LOW_MEMORY_KILLER
 bool "Android Low Memory Killer"
+select OOM_SCORE_NOTIFIER
 ---help---
   Registers processes to be killed when low memory conditions, this is 
useful
   as there is no particular swap space on android.
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index d710eb2..b7a8036 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -4,4 +4,5 @@ obj-y+= ion/

 obj-$(CONFIG_ASHMEM)+= ashmem.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
+obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller_tasks.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER_STATS)+= lowmemorykiller_stats.o
diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
index 15c1b38..1e275b1 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -41,10 +41,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include "lowmemorykiller.h"
 #include "lowmemorykiller_stats.h"
+#include "lowmemorykiller_tasks.h"

-static u32 lowmem_debug_level = 1;
+u32 lowmem_debug_level = 1;
 static short lowmem_adj[6] = {
 0,
 1,
@@ -62,135 +66,212 @@ static int lowmem_minfree[6] = {

 static int lowmem_minfree_size = 4;

-static unsigned long lowmem_deathpending_timeout;
-
-#define lowmem_print(level, x...)\
-do {\
-if (lowmem_debug_level >= (level))\
-pr_info(x);\
-} while (0)
-
-static unsigned long lowmem_count(struct shrinker *s,
-  struct shrink_control *sc)
-{
-lmk_inc_stats(LMK_COUNT);
-return global_node_page_state(NR_ACTIVE_ANON) +
-global_node_page_state(NR_ACTIVE_FILE) +
-global_node_page_state(NR_INACTIVE_ANON) +
-global_node_page_state(NR_INACTIVE_FILE);
-}
+struct calculated_params {
+long selected_tasksize;
+long minfree;
+int other_file;
+int other_free;
+int dynamic_max_queue_len;
+short selected_oom_score_adj;
+short min_score_adj;
+};

-static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
+static int kill_needed(int level, struct shrink_control *sc,
+   struct calculated_params *cp)
 {
-struct task_struct *tsk;
-struct task_struct *selected = NULL;
-unsigned long rem = 0;
-int tasksize;
 int i;
-short min_score_adj = OOM_SCORE_ADJ_MAX + 1;
-int minfree = 0;
-int selected_task

[PATCH 0/3 staging-next] android: Lowmemmorykiller task tree

2017-02-09 Thread peter enderborg

Lowmemorykiller efficiency problem and a solution.

Lowmemorykiller in android has a severe efficiency problem. The basic
problem is that the registered shrinker gets called very often without
 anything actually happening. This is in some cases not a problem as
it is a simple calculation that returns a value. But when there is
high pressure on memory and we get to start killing processes to free
memory we get some heavy work that does lots of cpu processing and
lock holding for no real benefit. This occurs when we are below the
first threshold level in minfree. We call that waste. To see this
problem we introduce a patch that collects statistics from
lowmemorykiller. We collect the amount of kills, scans, counts and
some other metrics. One of this metrics is called waste. These metrics
are presented in procfs as /proc/lmkstats.

Patchset:
0001-android-Collect-statistics-from-lowmemorykiller.patch
0002-oom-Add-notification-for-oom_score_adj.patch
0003-mm-Remove-RCU-and-tasklocks-from-lmk.patch


Collect-statistics-from-lowmemorykiller.patch
-
This patch only adds metrics and is there to show
behavour before and after and is a good way to
see that the device is in waste zone.


0002-oom-Add-notification-for-oom_score_adj.patch

This is the prerequisite patch to be able to do
the lowmemorykiller change. It introduces notifiers
for oom_score_adj. It generates notifier events for
process creation and death, and when process values
are changed.  These patches are outside from stageing
drivers and are applied to core functions in e.g. fork.c.

0003-mm-Remove-RCU-and-tasklocks-from-lmk.patch
---
This patch is the change of lowmemorykiller. It
builds a tree structure that works as cache for
the task list, but only contains the tasks that
are relevant for the lmk. The key thing here is
that the cache is sorted based on the oom_score_adj
value so the scan and count function can find
the right task with only a tree first operation.
Based on the right task the count can give a
proper reply and give a right estimate of the
amount it will free, and more important when
it is not willing to free anything. This makes
the shrinker not to call the scan function at all,
and when it is called it actually do what it's
supposed to do that is to free up some memory.
I consider this as mm based on the behaviour
changes for the shrinker even if the code is
a driver.

About testing.
Reproduce the problem. For this the first patch is needed and enabeld.
It does not change the lowmemory killer other than it add some metrics.
One counter is called WASTE. This is what this patch-set is about.
In android environment this can be tested directly. On other systems
like fedora a method using the stress package can be used. Apply the
patches. (First with only metrics) then in your

shell: echo 400 > /proc/self/oom_score_adj

Now you have created a shell that has something that can be killed.
In the same shell use stress program. The parameters will be very
dependent on your configuration, but you need to run out of memmory.

Most of the wasted cpu cycles are accounted in kswapd0 task so a compare
of the reduced waste can also be seen in the schedstat for that task.
However activitymanager will get some more work done in kernel space.
Finaly the new version also has the WASTE counter, but this one is
the cost of only a rbtree search.

Cost/Drawback
The impact on the fork call is on a 2ghz arm64 is about 500ns for the
notifier.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 2/3 staging-next] oom: Add notification for oom_score_adj

2017-02-09 Thread peter enderborg

This adds subscribtion for changes in oom_score_adj, this
value is important to android systems. For task that uses
oom_score_adj they read the task list. This can be long
and need rcu locks and has a impact on the system. Let
the user track the changes based on oom_score_adj changes
and keep them in their own context so they do their actions
with minimal system impact.

Signed-off-by: Peter Enderborg 
---
 fs/proc/base.c | 13 +++
 include/linux/oom_score_notifier.h | 47 
 kernel/Makefile|  1 +
 kernel/fork.c  |  6 +++
 kernel/oom_score_notifier.c| 75 ++
 mm/Kconfig |  9 +
 6 files changed, 151 insertions(+)
 create mode 100644 include/linux/oom_score_notifier.h
 create mode 100644 kernel/oom_score_notifier.c

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 87c9a9a..60c2d9b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -87,6 +87,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_HARDWALL
 #include 
 #endif
@@ -1057,6 +1058,7 @@ static int __set_oom_adj(struct file *file, int oom_adj, 
bool legacy)
 static DEFINE_MUTEX(oom_adj_mutex);
 struct mm_struct *mm = NULL;
 struct task_struct *task;
+int old_oom_score_adj;
 int err = 0;

 task = get_proc_task(file_inode(file));
@@ -1102,9 +1104,20 @@ static int __set_oom_adj(struct file *file, int oom_adj, 
bool legacy)
 }
 }

+old_oom_score_adj = task->signal->oom_score_adj;
 task->signal->oom_score_adj = oom_adj;
 if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE))
 task->signal->oom_score_adj_min = (short)oom_adj;
+
+#ifdef CONFIG_OOM_SCORE_NOTIFIER
+err = oom_score_notify_update(task, old_oom_score_adj);
+if (err) {
+/* rollback and error handle. */
+task->signal->oom_score_adj = old_oom_score_adj;
+goto err_unlock;
+}
+#endif
+
 trace_oom_score_adj_update(task);

 if (mm) {
diff --git a/include/linux/oom_score_notifier.h 
b/include/linux/oom_score_notifier.h
new file mode 100644
index 000..c5cea47
--- /dev/null
+++ b/include/linux/oom_score_notifier.h
@@ -0,0 +1,47 @@
+/*
+ *  oom_score_notifier interface
+ *  Copyright (C) 2017 Sony Mobile Communications Inc.
+ *
+ *  Author: Peter Enderborg 
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_OOM_SCORE_NOTIFIER_H
+#define _LINUX_OOM_SCORE_NOTIFIER_H
+
+#ifdef CONFIG_OOM_SCORE_NOTIFIER
+
+#include 
+#include 
+#include 
+
+enum osn_msg_type {
+OSN_NEW,
+OSN_FREE,
+OSN_UPDATE
+};
+
+extern struct atomic_notifier_head oom_score_notifier;
+extern int oom_score_notifier_register(struct notifier_block *n);
+extern int oom_score_notifier_unregister(struct notifier_block *n);
+extern int oom_score_notify_free(struct task_struct *tsk);
+extern int oom_score_notify_new(struct task_struct *tsk);
+extern int oom_score_notify_update(struct task_struct *tsk, int old_score);
+
+struct oom_score_notifier_struct {
+struct task_struct *tsk;
+int old_score;
+};
+
+#else
+
+#define oom_score_notify_free(t)  do {} while (0)
+#define oom_score_notify_new(t) false
+#define oom_score_notify_update(t, s) do {} while (0)
+
+#endif /* CONFIG_OOM_SCORE_NOTIFIER */
+
+#endif /* _LINUX_OOM_SCORE_NOTIFIER_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 12c679f..747c66c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
+obj-$(CONFIG_OOM_SCORE_NOTIFIER) += oom_score_notifier.o
 obj-$(CONFIG_LATENCYTOP) += latencytop.o
 obj-$(CONFIG_ELFCORE) += elfcore.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace/
diff --git a/kernel/fork.c b/kernel/fork.c
index 11c5c8a..f8a1a89 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -73,6 +73,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -391,6 +392,7 @@ void __put_task_struct(struct task_struct *tsk)
 exit_creds(tsk);
 delayacct_tsk_free(tsk);
 put_signal_struct(tsk->signal);
+oom_score_notify_free(tsk);

 if (!profile_handoff_task(tsk))
 free_task(tsk);
@@ -1790,6 +1792,10 @@ static __latent_entropy struct task_struct *copy_process(

 init_task_pid(p, PIDTYPE_PID, pid);
 if (thread_group_leader(p)) {
+retval = oom_score_notify_new(p);
+if (retval)
+goto bad_fork_cancel_cgroup;
+
 init_task_pid(p, PIDTYPE_PGID, task_pgrp(current));
 init_task_pid(p, PIDTYPE_SID, task_session(current));

diff --git a/kernel/oom_score_notifier.c b/kernel/oom_score_no

[PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread peter enderborg

This collects stats for shrinker calls and how much
waste work we do within the lowmemorykiller.

Signed-off-by: Peter Enderborg 
---
 drivers/staging/android/Kconfig | 11 
 drivers/staging/android/Makefile|  1 +
 drivers/staging/android/lowmemorykiller.c   |  9 ++-
 drivers/staging/android/lowmemorykiller_stats.c | 85 +
 drivers/staging/android/lowmemorykiller_stats.h | 29 +
 5 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/android/lowmemorykiller_stats.c
 create mode 100644 drivers/staging/android/lowmemorykiller_stats.h

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 6c00d6f..96e86c7 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -24,6 +24,17 @@ config ANDROID_LOW_MEMORY_KILLER
   scripts (/init.rc), and it defines priority values with minimum free 
memory size
   for each priority.

+config ANDROID_LOW_MEMORY_KILLER_STATS
+bool "Android Low Memory Killer: collect statistics"
+depends on ANDROID_LOW_MEMORY_KILLER
+default n
+help
+  Create a file in /proc/lmkstats that includes
+  collected statistics about kills, scans and counts
+  and  interaction with the shrinker. Its content
+  will be different depeding on lmk implementation used.
+
+
 source "drivers/staging/android/ion/Kconfig"

 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7ed1be7..d710eb2 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -4,3 +4,4 @@ obj-y+= ion/

 obj-$(CONFIG_ASHMEM)+= ashmem.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
+obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER_STATS)+= lowmemorykiller_stats.o
diff --git a/drivers/staging/android/lowmemorykiller.c 
b/drivers/staging/android/lowmemorykiller.c
index ec3b665..15c1b38 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include "lowmemorykiller_stats.h"

 static u32 lowmem_debug_level = 1;
 static short lowmem_adj[6] = {
@@ -72,6 +73,7 @@ static unsigned long lowmem_deathpending_timeout;
 static unsigned long lowmem_count(struct shrinker *s,
   struct shrink_control *sc)
 {
+lmk_inc_stats(LMK_COUNT);
 return global_node_page_state(NR_ACTIVE_ANON) +
 global_node_page_state(NR_ACTIVE_FILE) +
 global_node_page_state(NR_INACTIVE_ANON) +
@@ -95,6 +97,7 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
 global_node_page_state(NR_SHMEM) -
 total_swapcache_pages();

+lmk_inc_stats(LMK_SCAN);
 if (lowmem_adj_size < array_size)
 array_size = lowmem_adj_size;
 if (lowmem_minfree_size < array_size)
@@ -134,6 +137,7 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
 if (task_lmk_waiting(p) &&
 time_before_eq(jiffies, lowmem_deathpending_timeout)) {
 task_unlock(p);
+lmk_inc_stats(LMK_TIMEOUT);
 rcu_read_unlock();
 return 0;
 }
@@ -179,7 +183,9 @@ static unsigned long lowmem_scan(struct shrinker *s, struct 
shrink_control *sc)
  other_free * (long)(PAGE_SIZE / 1024));
 lowmem_deathpending_timeout = jiffies + HZ;
 rem += selected_tasksize;
-}
+lmk_inc_stats(LMK_KILL);
+} else
+lmk_inc_stats(LMK_WASTE);

 lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
  sc->nr_to_scan, sc->gfp_mask, rem);
@@ -196,6 +202,7 @@ static struct shrinker lowmem_shrinker = {
 static int __init lowmem_init(void)
 {
 register_shrinker(&lowmem_shrinker);
+init_procfs_lmk();
 return 0;
 }
 device_initcall(lowmem_init);
diff --git a/drivers/staging/android/lowmemorykiller_stats.c 
b/drivers/staging/android/lowmemorykiller_stats.c
new file mode 100644
index 000..673691c
--- /dev/null
+++ b/drivers/staging/android/lowmemorykiller_stats.c
@@ -0,0 +1,85 @@
+/*
+ *  lowmemorykiller_stats
+ *
+ *  Copyright (C) 2017 Sony Mobile Communications Inc.
+ *
+ *  Author: Peter Enderborg 
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+/* This code is bookkeeping of statistical information
+ * from lowmemorykiller and provide a node in proc "/proc/lmkstats".
+ */
+
+#include 
+#include 
+#include "lowmemorykiller_stats.h"
+
+struct lmk_stats {
+atomic_long_t scans; /* counter as in shrinker scans */
+atomic_long_t kills; /* the number of sigkills sent */
+atomic_long_t waste