Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-15 Thread Sultan Alsawaf
On Wed, May 15, 2019 at 02:32:48PM -0400, Steven Rostedt wrote:
> I'm confused why you did this?

Oleg said that debug_locks_off() could've been called and thus prevented
lockdep complaints about simple_lmk from appearing. To eliminate any possibility
of that, I disabled debug_locks_off().

Oleg also said that __lock_acquire() could return early if lock debugging were
somehow turned off after lockdep reported one bug. To mitigate any possibility
of that as well, I threw in the BUG_ON() for good measure.

I think at this point it's pretty clear that lockdep truly isn't complaining
about simple_lmk's locking pattern, and that lockdep's lack of complaints isn't
due to it being mysteriously turned off...

Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-14 Thread Sultan Alsawaf
On Tue, May 14, 2019 at 12:44:53PM -0400, Steven Rostedt wrote:
> OK, this has gotten my attention.
> 
> This thread is quite long, do you have a git repo I can look at, and
> also where is the first task_lock() taken before the
> find_lock_task_mm()?
> 
> -- Steve

Hi Steve,

This is the git repo I work on: 
https://github.com/kerneltoast/android_kernel_google_wahoo

With the newest simple_lmk iteration being this commit: 
https://github.com/kerneltoast/android_kernel_google_wahoo/commit/6b145b8c28b39f7047393169117f72ea7387d91c

This repo is based off the 4.4 kernel that Google ships on the Pixel 2/2XL.

simple_lmk iterates through the entire task list more than once and locks
potential victims using find_lock_task_mm(). It keeps these potential victims
locked across the multiple times that the task list is iterated.

The locking pattern that Oleg said should cause lockdep to complain is that
iterating through the entire task list more than once can lead to locking the
same task that was locked earlier with find_lock_task_mm(), and thus deadlock.
But there is a check in simple_lmk that avoids locking potential victims that
were already found, which avoids the deadlock, but lockdep doesn't know about
the check (which is done with vtsk_is_duplicate()) and should therefore
complain.

Lockdep does not complain though.

Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-13 Thread Sultan Alsawaf
On Fri, May 10, 2019 at 05:10:25PM +0200, Oleg Nesterov wrote:
> I am starting to think I am ;)
> 
> If you have task1 != task2 this code
> 
>   task_lock(task1);
>   task_lock(task2);
> 
> should trigger print_deadlock_bug(), task1->alloc_lock and task2->alloc_lock 
> are
> the "same" lock from lockdep pov, held_lock's will have the same 
> hlock_class().

Okay, I've stubbed out debug_locks_off(), and lockdep is now complaining about a
bunch of false positives so it is _really_ enabled this time. I grepped for
lockdep last time to try and find a concise way to show over email that lockdep
didn't complain, but that backfired. Here is better evidence:

taimen:/ # dmesg | grep simple_lmk
[   58.349917] simple_lmk: Killing droid.deskclock with adj 906 to free 47548 
KiB
[   58.354748] simple_lmk: Killing .android.dialer with adj 906 to free 36576 
KiB
[   58.355030] simple_lmk: Killing rbandroid.sleep with adj 904 to free 50016 
KiB
[   58.582833] simple_lmk: Killing oadcastreceiver with adj 904 to free 43044 
KiB
[   58.587731] simple_lmk: Killing .apps.wellbeing with adj 902 to free 48128 
KiB
[   58.588084] simple_lmk: Killing android.carrier with adj 902 to free 43636 
KiB
[   58.671857] simple_lmk: Killing ndroid.keychain with adj 902 to free 39992 
KiB
[   58.675622] simple_lmk: Killing gs.intelligence with adj 900 to free 49572 
KiB
[   58.675770] simple_lmk: Killing le.modemservice with adj 800 to free 41976 
KiB
[   58.762060] simple_lmk: Killing ndroid.settings with adj 700 to free 74708 
KiB
[   58.763238] simple_lmk: Killing roid.apps.turbo with adj 700 to free 54660 
KiB
[   58.873337] simple_lmk: Killing d.process.acore with adj 700 to free 48540 
KiB
[   58.873513] simple_lmk: Killing d.process.media with adj 500 to free 46188 
KiB
[   58.873713] simple_lmk: Killing putmethod.latin with adj 200 to free 67304 
KiB
[   59.014046] simple_lmk: Killing android.vending with adj 201 to free 54900 
KiB
[   59.017623] simple_lmk: Killing rak.mibandtools with adj 201 to free 44552 
KiB
[   59.018423] simple_lmk: Killing eport.trebuchet with adj 100 to free 106208 
KiB
[   59.223633] simple_lmk: Killing id.printspooler with adj 900 to free 39664 
KiB
[   59.223762] simple_lmk: Killing gle.android.gms with adj 100 to free 64176 
KiB
[   70.955204] simple_lmk: Killing .apps.translate with adj 906 to free 47564 
KiB
[   70.955633] simple_lmk: Killing cloudprint:sync with adj 906 to free 31932 
KiB
[   70.955787] simple_lmk: Killing oid.apps.photos with adj 904 to free 50204 
KiB
[   71.060789] simple_lmk: Killing ecamera.android with adj 906 to free 32232 
KiB
[   71.061074] simple_lmk: Killing webview_service with adj 906 to free 26028 
KiB
[   71.061199] simple_lmk: Killing com.whatsapp with adj 904 to free 49484 KiB
[   71.190625] simple_lmk: Killing rbandroid.sleep with adj 906 to free 54724 
KiB
[   71.190775] simple_lmk: Killing android.vending with adj 906 to free 39848 
KiB
[   71.191303] simple_lmk: Killing m.facebook.orca with adj 904 to free 72296 
KiB
[   71.342042] simple_lmk: Killing droid.deskclock with adj 902 to free 49284 
KiB
[   71.342240] simple_lmk: Killing .apps.wellbeing with adj 900 to free 47632 
KiB
[   71.342529] simple_lmk: Killing le.modemservice with adj 800 to free 33648 
KiB
[   71.482391] simple_lmk: Killing d.process.media with adj 800 to free 40676 
KiB
[   71.482511] simple_lmk: Killing rdog.challegram with adj 700 to free 71920 
KiB
taimen:/ #

The first simple_lmk message appears at 58.349917. And based on the timestamps,
it's clear that simple_lmk called task_lock() for multiple different tasks,
which is the pattern you think should cause lockdep to complain. But here is the
full dmesg starting from that point:

[   58.349917] simple_lmk: Killing droid.deskclock with adj 906 to free 47548 
KiB
[   58.354748] simple_lmk: Killing .android.dialer with adj 906 to free 36576 
KiB
[   58.355030] simple_lmk: Killing rbandroid.sleep with adj 904 to free 50016 
KiB
[   58.432654] binder_alloc: 2284: binder_alloc_buf failed to map pages in 
userspace, no vma
[   58.432671] binder: 1206:1218 transaction failed 29189/-3, size 76-0 line 
3189
[   58.582833] simple_lmk: Killing oadcastreceiver with adj 904 to free 43044 
KiB
[   58.587731] simple_lmk: Killing .apps.wellbeing with adj 902 to free 48128 
KiB
[   58.588084] simple_lmk: Killing android.carrier with adj 902 to free 43636 
KiB
[   58.590785] binder: undelivered transaction 58370, process died.
[   58.671857] simple_lmk: Killing ndroid.keychain with adj 902 to free 39992 
KiB
[   58.675622] simple_lmk: Killing gs.intelligence with adj 900 to free 49572 
KiB
[   58.675770] simple_lmk: Killing le.modemservice with adj 800 to free 41976 
KiB
[   58.736678] binder: undelivered transaction 58814, process died.
[   58.736733] binder: release 3099:3128 transaction 57832 in, still active
[   58.736744] binder: send failed reply for transaction 57832 to 1876:3090
[   58.736761] binder: undelivered TRANSACTION_COMPLETE
[   

Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-09 Thread Sultan Alsawaf
On Thu, May 09, 2019 at 05:56:46PM +0200, Oleg Nesterov wrote:
> Impossible ;) I bet lockdep should report the deadlock as soon as 
> find_victims()
> calls find_lock_task_mm() when you already have a locked victim.

I hope you're not a betting man ;)

With the following configured:
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
# CONFIG_DEBUG_SPINLOCK_BITE_ON_BUG is not set
CONFIG_DEBUG_SPINLOCK_PANIC_ON_BUG=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set

And a printk added in vtsk_is_duplicate() to print when a duplicate is detected,
and my phone's memory cut in half to make simple_lmk do something, this is what
I observed:
taimen:/ # dmesg | grep lockdep
[0.00] \x09RCU lockdep checking is enabled.
taimen:/ # dmesg | grep simple_lmk
[   23.211091] simple_lmk: Killing android.carrier with adj 906 to free 37420 
kiB
[   23.211160] simple_lmk: Killing oadcastreceiver with adj 906 to free 36784 
kiB
[   23.248457] simple_lmk: Killing .apps.translate with adj 904 to free 45884 
kiB
[   23.248720] simple_lmk: Killing ndroid.settings with adj 904 to free 42868 
kiB
[   23.313417] simple_lmk: DUPLICATE VTSK!
[   23.313440] simple_lmk: Killing ndroid.keychain with adj 906 to free 33680 
kiB
[   23.313513] simple_lmk: Killing com.whatsapp with adj 904 to free 51436 kiB
[   34.646695] simple_lmk: DUPLICATE VTSK!
[   34.646717] simple_lmk: Killing ndroid.apps.gcs with adj 906 to free 37956 
kiB
[   34.646792] simple_lmk: Killing droid.apps.maps with adj 904 to free 63600 
kiB
taimen:/ # dmesg | grep lockdep
[0.00] \x09RCU lockdep checking is enabled.
taimen:/ # 

> As for 
> https://github.com/kerneltoast/android_kernel_google_wahoo/commit/afc8c9bf2dbde95941253c168d1adb64cfa2e3ad
> Well,
> 
>   mmdrop(mm);
>   simple_lmk_mm_freed(mm);
> 
> looks racy because mmdrop(mm) can free this mm_struct. Yes, 
> simple_lmk_mm_freed()
> does not dereference this pointer, but the same memory can be re-allocated as
> another ->mm for the new task which can be found by find_victims(), and _in 
> theory_
> this all can happen in between, so the "victims[i].mm == mm" can be false 
> positive.
> 
> And this also means that simple_lmk_mm_freed() should clear victims[i].mm when
> it detects "victims[i].mm == mm", otherwise we have the same theoretical race,
> victims_to_kill is only cleared when the last victim goes away.

Fair point. Putting simple_lmk_mm_freed(mm) right before mmdrop(mm), and
sprinkling in a cmpxchg in simple_lmk_mm_freed() should fix that up.

> Another nit... you can drop tasklist_lock right after the 1st "find_victims" 
> loop.

True!

> And it seems that you do not really need to walk the "victims" array twice 
> after that,
> you can do everything in a single loop, but this is cosmetic.

Won't this result in potentially holding the task lock way longer than necessary
for multiple tasks that aren't going to be killed?

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Sultan Alsawaf
On Tue, May 07, 2019 at 12:58:27PM +0200, Christian Brauner wrote:
> This is work that is ongoing and requires kernel changes to make it
> feasible. One of the things that I have been working on for quite a
> while is the whole file descriptor for processes thing that is important
> for LMKD (Even though I never thought about this use-case when I started
> pitching this.). Joel and Daniel have joined in and are working on
> making LMKD possible.
> What I find odd is that every couple of weeks different solutions to the
> low memory problem are pitched. There is simple_lkml, there is LMKD, and
> there was a patchset that wanted to speed up memory reclaim at process
> kill-time by adding a new flag to the new pidfd_send_signal() syscall.
> That all seems - though related - rather uncoordinated. Now granted,
> coordinated is usually not how kernel development necessarily works but
> it would probably be good to have some sort of direction and from what I
> have seen LMKD seems to be the most coordinated effort. But that might
> just be my impression.

LMKD is just Android's userspace low-memory-killer daemon. It's been around for
a while.

This patch (simple_lmk) is meant to serve as an immediate solution for the
devices that'll never see a single line of PSI code running on them, which
amounts to... well, most Android devices currently in existence. I'm more of a
cowboy who made this patch after waiting a few years for memory management
improvements on Android that never happened. Though it looks like it's going to
happen soon(ish?) for super new devices that'll have the privilege of shipping
with PSI in use.

On Tue, May 07, 2019 at 01:09:21PM +0200, Greg Kroah-Hartman wrote:
> > It's even more odd that although a userspace solution is touted as the 
> > proper
> > way to go on LKML, almost no Android OEMs are using it, and even in that 
> > commit
> > I linked in the previous message, Google made a rather large set of
> > modifications to the supposedly-defunct lowmemorykiller.c not one month ago.
> > What's going on?
> 
> "almost no"?  Again, Android Go is doing that, right?

I'd check for myself, but I can't seem to find kernel source for an Android Go
device...

This seems more confusing though. Why would the ultra-low-end devices use LMKD
while other devices use the broken lowmemorykiller driver?

> > Qualcomm still uses lowmemorykiller.c [1] on the Snapdragon 845.
> 
> Qualcomm should never be used as an example of a company that has any
> idea of what to do in their kernel :)

Agreed, but nearly all OEMs that use Qualcomm chipsets roll with Qualcomm's
kernel decisions, so Qualcomm has a bit of influence here.

> > If PSI were backported to 4.4, or even 3.18, would it really be used?
> 
> Why wouldn't it, if it worked properly?

For the same mysterious reason that Qualcomm and others cling to
lowmemorykiller, I presume. This is part of what's been confusing me for quite
some time...

> Please see the work that went into PSI and the patches around it.
> There's also a lwn.net article last week about the further work ongoing
> in this area.  With all of that, you should see how in-kernel memory
> killers are NOT the way to go.
> 
> > Regardless, even if PSI were backported, a full-fledged LMKD using it has 
> > yet to
> > be made, so it wouldn't be of much use now.
> 
> "LMKD"?  Again, PSI is in the 4.9.y android-common tree, so the
> userspace side should be in AOSP, right?

LMKD as in Android's low-memory-killer daemon. It is in AOSP, but it looks like
it's still a work in progress.

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Sultan Alsawaf
On Tue, May 07, 2019 at 09:28:47AM -0700, Suren Baghdasaryan wrote:
> Hi Sultan,
> Looks like you are posting this patch for devices that do not use
> userspace LMKD solution due to them using older kernels or due to
> their vendors sticking to in-kernel solution. If so, I see couple
> logistical issues with this patch. I don't see it being adopted in
> upstream kernel 5.x since it re-implements a deprecated mechanism even
> though vendors still use it. Vendors on the other hand, will not adopt
> it until you show evidence that it works way better than what
> lowmemorykilled driver does now. You would have to provide measurable
> data and explain your tests before they would consider spending time
> on this.

Yes, this is mostly for the devices already produced that are forced to suffer
with poor memory management. I can't even convince vendors to fix kernel
memory leaks, so there's no way I'd be able to convince them of trying this
patch, especially since it seems like you're having trouble convincing vendors
to stop using lowmemorykiller in the first place. And thankfully, convincing
vendors isn't my job :)

> On the implementation side I'm not convinced at all that this would
> work better on all devices and in all circumstances. We had cases when
> a new mechanism would show very good results until one usecase
> completely broke it. Bulk killing of processes that you are doing in
> your patch was a very good example of such a decision which later on
> we had to rethink. That's why baking these policies into kernel is
> very problematic. Another problem I see with the implementation that
> it ties process killing with the reclaim scan depth. It's very similar
> to how vmpressure works and vmpressure in my experience is very
> unpredictable.

Could you elaborate a bit on why bulk killing isn't good?

> > > I linked in the previous message, Google made a rather large set of
> > > modifications to the supposedly-defunct lowmemorykiller.c not one month 
> > > ago.
> > > What's going on?
> 
> If you look into that commit, it adds ability to report kill stats. If
> that was a change in how that driver works it would be rejected.

Fair, though it was quite strange seeing something that was supposedly totally
abandoned receiving a large chunk of code for reporting stats.

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Sultan Alsawaf
On Tue, May 07, 2019 at 05:31:54PM +0200, Oleg Nesterov wrote:
> I am not going to comment the intent, but to be honest I am skeptical too.

The general sentiment has been that this is a really bad idea, but I'm just a
frustrated Android user who wants his phone to not require mountains of zRAM
only to still manage memory poorly. Until I can go out and buy a non-Pixel phone
that uses PSI to make these decisions (and does a good job of it), I'm going to
stick to my hacky little driver for my personal devices. Many others who like to
hack their Android devices to make them last longer will probably find value in
this as well, since there are millions of people who use devices that'll never
seen any of PSI ported to their ancient 3.x kernels.

And yes, I know this would never be accepted to upstream in a million years. I
mostly wanted some code review and guidance, since mm code is pretty tricky :)

> On 05/06, Sultan Alsawaf wrote:
> >
> > +static unsigned long find_victims(struct victim_info *varr, int *vindex,
> > + int vmaxlen, int min_adj, int max_adj)
> > +{
> > +   unsigned long pages_found = 0;
> > +   int old_vindex = *vindex;
> > +   struct task_struct *tsk;
> > +
> > +   for_each_process(tsk) {
> > +   struct task_struct *vtsk;
> > +   unsigned long tasksize;
> > +   short oom_score_adj;
> > +
> > +   /* Make sure there's space left in the victim array */
> > +   if (*vindex == vmaxlen)
> > +   break;
> > +
> > +   /* Don't kill current, kthreads, init, or duplicates */
> > +   if (same_thread_group(tsk, current) ||
> > +   tsk->flags & PF_KTHREAD ||
> > +   is_global_init(tsk) ||
> > +   vtsk_is_duplicate(varr, *vindex, tsk))
> > +   continue;
> > +
> > +   vtsk = find_lock_task_mm(tsk);
> 
> Did you test this patch with lockdep enabled?
> 
> If I read the patch correctly, lockdep should complain. vtsk_is_duplicate()
> ensures that we do not take the same ->alloc_lock twice or more, but lockdep
> can't know this.

Yeah, lockdep is fine with this, at least on 4.4.

> > +static void scan_and_kill(unsigned long pages_needed)
> > +{
> > +   static DECLARE_WAIT_QUEUE_HEAD(victim_waitq);
> > +   struct victim_info victims[MAX_VICTIMS];
> > +   int i, nr_to_kill = 0, nr_victims = 0;
> > +   unsigned long pages_found = 0;
> > +   atomic_t victim_count;
> > +
> > +   /*
> > +* Hold the tasklist lock so tasks don't disappear while scanning. This
> > +* is preferred to holding an RCU read lock so that the list of tasks
> > +* is guaranteed to be up to date. Keep preemption disabled until the
> > +* SIGKILLs are sent so the victim kill process isn't interrupted.
> > +*/
> > +   read_lock(_lock);
> > +   preempt_disable();
> 
> read_lock() disables preemption, every task_lock() too, so this looks
> unnecessary.

Good point.

> > +   for (i = 1; i < ARRAY_SIZE(adj_prio); i++) {
> > +   pages_found += find_victims(victims, _victims, MAX_VICTIMS,
> > +   adj_prio[i], adj_prio[i - 1]);
> > +   if (pages_found >= pages_needed || nr_victims == MAX_VICTIMS)
> > +   break;
> > +   }
> > +
> > +   /*
> > +* Calculate the number of tasks that need to be killed and quickly
> > +* release the references to those that'll live.
> > +*/
> > +   for (i = 0, pages_found = 0; i < nr_victims; i++) {
> > +   struct victim_info *victim = [i];
> > +   struct task_struct *vtsk = victim->tsk;
> > +
> > +   /* The victims' mm lock is taken in find_victims; release it */
> > +   if (pages_found >= pages_needed) {
> > +   task_unlock(vtsk);
> > +   continue;
> > +   }
> > +
> > +   /*
> > +* Grab a reference to the victim so it doesn't disappear after
> > +* the tasklist lock is released.
> > +*/
> > +   get_task_struct(vtsk);
> 
> The comment doesn't look correct. the victim can't dissapear until 
> task_unlock()
> below, it can't pass exit_mm().

I was always unsure about this and decided to hold a reference to the
task_struct to be safe. Thanks for clearing that up.

> > +   pages_found += victim->size;
> > +   nr_to_kill++;
> > +   }
> > +   read_unlock(_lock);
> > +
> > +   /* Kill the victims */
> > +   victim_co

Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Sultan Alsawaf
On Tue, May 07, 2019 at 09:43:34AM +0200, Greg Kroah-Hartman wrote:
> Given that any "new" android device that gets shipped "soon" should be
> using 4.9.y or newer, is this a real issue?

It's certainly a real issue for those who can't buy brand new Android devices
without software bugs every six months :)

> And if it is, I'm sure that asking for those patches to be backported to
> 4.4.y would be just fine, have you asked?
>
> Note that I know of Android Go devices, running 3.18.y kernels, do NOT
> use the in-kernel memory killer, but instead use the userspace solution
> today.  So trying to get another in-kernel memory killer solution added
> anywhere seems quite odd.

It's even more odd that although a userspace solution is touted as the proper
way to go on LKML, almost no Android OEMs are using it, and even in that commit
I linked in the previous message, Google made a rather large set of
modifications to the supposedly-defunct lowmemorykiller.c not one month ago.
What's going on?

Qualcomm still uses lowmemorykiller.c [1] on the Snapdragon 845. If PSI were
backported to 4.4, or even 3.18, would it really be used? I don't really
understand the aversion to an in-kernel memory killer on LKML despite the rest
of the industry's attraction to it. Perhaps there's some inherently great cost
in using the userspace solution that I'm unaware of?

Regardless, even if PSI were backported, a full-fledged LMKD using it has yet to
be made, so it wouldn't be of much use now.

Thanks,
Sultan

[1] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/arch/arm64/configs/sdm845_defconfig?h=LA.UM.7.3.r1-07400-sdm845.0#n492
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Sultan Alsawaf
On Tue, May 07, 2019 at 09:04:30AM +0200, Greg Kroah-Hartman wrote:
> Um, why can't "all" Android devices take the same patches that the Pixel
> phones are using today?  They should all be in the public android-common
> kernel repositories that all Android devices should be syncing with on a
> weekly/monthly basis anyway, right?
> 
> thanks,
> 
> greg k-h

Hi Greg,

I only see PSI present in the android-common kernels for 4.9 and above. The vast
majority of Android devices do not run a 4.9+ kernel. It seems unreasonable to
expect OEMs to toil with backporting PSI themselves to get decent memory
management.

But even if they did backport PSI, it wouldn't help too much because a
PSI-enabled LMKD solution is not ready yet. It looks like a PSI-based LMKD is
still under heavy development and won't be ready for all Android devices for
quite some time.

Additionally, it looks like the supposedly-dead lowmemorykiller.c is still being
actively tweaked by Google [1], which does not instill confidence that a
definitive LMK solution a la PSI is coming any time soon.

Thanks,
Sultan

[1] 
https://android.googlesource.com/kernel/common/+/152bacdd85c46f0c76b00c4acc253e414513634c
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-06 Thread Sultan Alsawaf
This is a complete low memory killer solution for Android that is small
and simple. Processes are killed according to the priorities that
Android gives them, so that the least important processes are always
killed first. Processes are killed until memory deficits are satisfied,
as observed from kswapd struggling to free up pages. Simple LMK stops
killing processes when kswapd finally goes back to sleep.

The only tunables are the desired amount of memory to be freed per
reclaim event and desired frequency of reclaim events. Simple LMK tries
to free at least the desired amount of memory per reclaim and waits
until all of its victims' memory is freed before proceeding to kill more
processes.

Signed-off-by: Sultan Alsawaf 
---
Hello everyone,

I've addressed some of the concerns that were brought up with the first version
of the Simple LMK patch. I understand that a kernel-based solution like this
that contains policy decisions for a specific userspace is not the way to go,
but the Android ecosystem still has a pressing need for a low memory killer that
works well.

Most Android devices still use the ancient and deprecated lowmemorykiller.c
kernel driver; Simple LMK seeks to replace that, at the very least until PSI and
a userspace daemon utilizing PSI are ready for *all* Android devices, and not
just the privileged Pixel phone line.

The feedback I received last time was quite helpful. This Simple LMK patch works
significantly better than the first, improving memory management by a large
margin while being immune to transient spikes in memory usage (since the signal
to start killing processes is determined by how hard kswapd tries to free up
pages, which is something that occurs over a span of time and not a single point
in time).

I'd love to hear some feedback on this new patch. I do encourage those who are
interested to take it for a spin on an Android device. This patch has been
tested successfully on Android 4.4 and 4.9 kernels. For the sake of review here,
I have adapted the patch to 5.1.

Thanks,
Sultan

 drivers/android/Kconfig  |  33 
 drivers/android/Makefile |   1 +
 drivers/android/simple_lmk.c | 315 +++
 include/linux/mm_types.h |   4 +
 include/linux/simple_lmk.h   |  11 ++
 kernel/fork.c|  13 ++
 mm/vmscan.c  |  12 ++
 7 files changed, 389 insertions(+)
 create mode 100644 drivers/android/simple_lmk.c
 create mode 100644 include/linux/simple_lmk.h

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 6fdf2abe4..bdd429338 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -54,6 +54,39 @@ config ANDROID_BINDER_IPC_SELFTEST
  exhaustively with combinations of various buffer sizes and
  alignments.
 
+config ANDROID_SIMPLE_LMK
+   bool "Simple Android Low Memory Killer"
+   depends on !ANDROID_LOW_MEMORY_KILLER && !MEMCG
+   ---help---
+ This is a complete low memory killer solution for Android that is
+ small and simple. Processes are killed according to the priorities
+ that Android gives them, so that the least important processes are
+ always killed first. Processes are killed until memory deficits are
+ satisfied, as observed from kswapd struggling to free up pages. Simple
+ LMK stops killing processes when kswapd finally goes back to sleep.
+
+if ANDROID_SIMPLE_LMK
+
+config ANDROID_SIMPLE_LMK_AGGRESSION
+   int "Reclaim frequency selection"
+   range 1 3
+   default 1
+   help
+ This value determines how frequently Simple LMK will perform memory
+ reclaims. A lower value corresponds to less frequent reclaims, which
+ maximizes memory usage. The range of values has a logarithmic
+ correlation; 2 is twice as aggressive as 1, and 3 is twice as
+ aggressive as 2, which makes 3 four times as aggressive as 1.
+
+config ANDROID_SIMPLE_LMK_MINFREE
+   int "Minimum MiB of memory to free per reclaim"
+   range 8 512
+   default 64
+   help
+ Simple LMK will try to free at least this much memory per reclaim.
+
+endif
+
 endif # if ANDROID
 
 endmenu
diff --git a/drivers/android/Makefile b/drivers/android/Makefile
index c7856e320..7c91293b6 100644
--- a/drivers/android/Makefile
+++ b/drivers/android/Makefile
@@ -3,3 +3,4 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-$(CONFIG_ANDROID_BINDERFS) += binderfs.o
 obj-$(CONFIG_ANDROID_BINDER_IPC)   += binder.o binder_alloc.o
 obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o
+obj-$(CONFIG_ANDROID_SIMPLE_LMK)   += simple_lmk.o
diff --git a/drivers/android/simple_lmk.c b/drivers/android/simple_lmk.c
new file mode 100644
index 0..a2ffb57b5
--- /dev/null
+++ b/drivers/android/simple_lmk.c
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Sultan Alsawaf .
+

Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-14 Thread Sultan Alsawaf
On Thu, Mar 14, 2019 at 11:16:41PM -0400, Steven Rostedt wrote:
> How would you implement such a method in userspace? kill() doesn't take
> any parameters but the pid of the process you want to send a signal to,
> and the signal to send. This would require a new system call, and be
> quite a bit of work. If you can solve this with an ebpf program, I
> strongly suggest you do that instead.

This can be done by introducing a new signal number that provides SIGKILL
functionality while blocking (maybe SIGKILLBLOCK?).

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-14 Thread Sultan Alsawaf
On Thu, Mar 14, 2019 at 10:54:48PM -0400, Joel Fernandes wrote:
> I'm not sure if that makes much semantic sense for how the signal handling is
> supposed to work. Imagine a parent sends SIGKILL to its child, and then does
> a wait(2). Because the SIGKILL blocks in your idea, then the wait cannot
> execute, and because the wait cannot execute, the zombie task will not get
> reaped and so the SIGKILL senders never gets unblocked and the whole thing
> just gets locked up. No? I don't know it just feels incorrect.

Block until the victim becomes a zombie instead.

> Further, in your idea adding stuff to task_struct will simply bloat it - when
> this task can easily be handled using eBPF without making any kernel changes.
> Either by probing sched_process_free or sched_process_exit tracepoints.
> Scheduler maintainers generally frown on adding stuff to task_struct
> pointlessly there's a good reason since bloating it effects the performance
> etc, and something like this would probably never be ifdef'd out behind a
> CONFIG.

Adding something to task_struct is just the easiest way to test things for
experimentation. This can be avoided in my suggestion by passing the pointer to
a completion via the relevant functions, and then completing it at the time the
victim transitions to a zombie state. I understand it's possible to use eBPF for
this, but it seems kind of messy since this functionality is something that I
think others would want provided by the kernel (i.e., anyone using PSI to
implement their own OOM killer daemon similar to LMKD).

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-14 Thread Sultan Alsawaf
On Thu, Mar 14, 2019 at 10:47:17AM -0700, Joel Fernandes wrote:
> About the 100ms latency, I wonder whether it is that high because of
> the way Android's lmkd is observing that a process has died. There is
> a gap between when a process memory is freed and when it disappears
> from the process-table.  Once a process is SIGKILLed, it becomes a
> zombie. Its memory is freed instantly during the SIGKILL delivery (I
> traced this so that's how I know), but until it is reaped by its
> parent thread, it will still exist in /proc/ . So if testing the
> existence of /proc/ is how Android is observing that the process
> died, then there can be a large latency where it takes a very long
> time for the parent to actually reap the child way after its memory
> was long freed. A quicker way to know if a process's memory is freed
> before it is reaped could be to read back /proc//maps in
> userspace of the victim , and that file will be empty for zombie
> processes. So then one does not need wait for the parent to reap it. I
> wonder how much of that 100ms you mentioned is actually the "Waiting
> while Parent is reaping the child", than "memory freeing time". So
> yeah for this second problem, the procfds work will help.
>
> By the way another approach that can provide a quick and asynchronous
> notification of when the process memory is freed, is to monitor
> sched_process_exit trace event using eBPF. You can tell eBPF the PID
> that you want to monitor before the SIGKILL. As soon as the process
> dies and its memory is freed, the eBPF program can send a notification
> to user space (using the perf_events polling infra). The
> sched_process_exit fires just after the mmput() happens so it is quite
> close to when the memory is reclaimed. This also doesn't need any
> kernel changes. I could come up with a prototype for this and
> benchmark it on Android, if you want. Just let me know.

Perhaps I'm missing something, but if you want to know when a process has died
after sending a SIGKILL to it, then why not just make the SIGKILL optionally
block until the process has died completely? It'd be rather trivial to just
store a pointer to an onstack completion inside the victim process' task_struct,
and then complete it in free_task().

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-12 Thread Sultan Alsawaf
On Tue, Mar 12, 2019 at 10:17:43AM -0700, Tim Murray wrote:
> Knowing whether a SIGKILL'd process has finished reclaiming is as far
> as I know not possible without something like procfds. That's where
> the 100ms timeout in lmkd comes in. lowmemorykiller and lmkd both
> attempt to wait up to 100ms for reclaim to finish by checking for the
> continued existence of the thread that received the SIGKILL, but this
> really means that they wait up to 100ms for the _thread_ to finish,
> which doesn't tell you anything about the memory used by that process.
> If those threads terminate early and lowmemorykiller/lmkd get a signal
> to kill again, then there may be two processes competing for CPU time
> to reclaim memory. That doesn't reclaim any faster and may be an
> unnecessary kill.
> ...
> - offer a way to wait for process termination so lmkd can tell when
> reclaim has finished and know when killing another process is
> appropriate

Should be pretty easy with something like this:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1549584a1..6ac478af2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1199,6 +1199,7 @@ struct task_struct {
unsigned long   lowest_stack;
unsigned long   prev_lowest_stack;
 #endif
+   ktime_t sigkill_time;
 
/*
 * New fields for task_struct should be added above here, so that
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa2..0ae182777 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -435,6 +435,8 @@ void put_task_stack(struct task_struct *tsk)
 
 void free_task(struct task_struct *tsk)
 {
+   ktime_t sigkill_time = tsk->sigkill_time;
+   pid_t pid = tsk->pid;
 #ifndef CONFIG_THREAD_INFO_IN_TASK
/*
 * The task is finally done with both the stack and thread_info,
@@ -455,6 +457,9 @@ void free_task(struct task_struct *tsk)
if (tsk->flags & PF_KTHREAD)
free_kthread_struct(tsk);
free_task_struct(tsk);
+   if (sigkill_time)
+   printk("%d killed after %lld us\n", pid,
+  ktime_us_delta(ktime_get(), sigkill_time));
 }
 EXPORT_SYMBOL(free_task);
 
@@ -1881,6 +1886,7 @@ static __latent_entropy struct task_struct *copy_process(
p->sequential_io= 0;
p->sequential_io_avg= 0;
 #endif
+   p->sigkill_time = 0;
 
/* Perform scheduler related setup. Assign this task to a CPU. */
retval = sched_fork(clone_flags, p);
diff --git a/kernel/signal.c b/kernel/signal.c
index 5d53183e2..1142c8811 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1168,6 +1168,8 @@ static int __send_signal(int sig, struct kernel_siginfo 
*info, struct task_struc
}
 
 out_set:
+   if (sig == SIGKILL)
+   t->sigkill_time = ktime_get();
signalfd_notify(t, sig);
sigaddset(>signal, sig);
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-12 Thread Sultan Alsawaf
On Tue, Mar 12, 2019 at 09:05:32AM +0100, Michal Hocko wrote:
> The only way to control the OOM behavior pro-actively is to throttle
> allocation speed. We have memcg high limit for that purpose. Along with
> PSI, I can imagine a reasonably working user space early oom
> notifications and reasonable acting upon that.

The issue with pro-active memory management that prompted me to create this was
poor memory utilization. All of the alternative means of reclaiming pages in the
page allocator's slow path turn out to be very useful for maximizing memory
utilization, which is something that we would have to forgo by relying on a
purely pro-active solution. I have not had a chance to look at PSI yet, but
unless a PSI-enabled solution allows allocations to reach the same point as when
the OOM killer is invoked (which is contradictory to what it sets out to do),
then it cannot take advantage of all of the alternative memory-reclaim means
employed in the slowpath, and will result in killing a process before it is
_really_ necessary.

> If you design is relies on the speed of killing then it is fundamentally
> flawed AFAICT. You cannot assume anything about how quickly a task dies.
> It might be blocked in an uninterruptible sleep or performin an
> operation which takes some time. Sure, oom_reaper might help here but
> still.

In theory we could instantly zap any process that is not trapped in the kernel
at the time that the OOM killer is invoked without any consequences though, no?

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-11 Thread Sultan Alsawaf
On Mon, Mar 11, 2019 at 03:15:35PM -0700, Suren Baghdasaryan wrote:
> This what LMKD currently is - a userspace RT process.
> My point was that this page allocation queue that you implemented
> can't be implemented in userspace, at least not without extensive
> communication with kernel.

Oh, that's easy to address. My page allocation queue and the decision on when to
kill a process are orthogonal. In fact, the page allocation queue could be
touched up a bit to factor in the issues Michal mentioned, and it can be
implemented as an improvement to the existing OOM killer. The point of it is
just to ensure that page allocation requests that have gone OOM are given
priority over other allocation requests when free pages start to trickle in.

Userspace doesn't need to know about the page allocation queue, and the queue is
not necessary to implement the method of determining when to kill processes that
I've proposed. It's an optimization, not a necessity.

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-11 Thread Sultan Alsawaf
On Mon, Mar 11, 2019 at 05:11:25PM -0400, Joel Fernandes wrote:
> But the point is that a transient temporary memory spike should not be a
> signal to kill _any_ process.  The reaction to kill shouldn't be so
> spontaneous that unwanted tasks are killed because the system went into
> panic mode. It should be averaged out which I believe is what PSI does.

In my patch from the first email, I implemented the decision to kill a process
at the same time that the existing kernel OOM killer decides to kill a process.
If the kernel's OOM killer were susceptible to killing processes due to
transient memory spikes, then I think there would have been several complaints
about this behavior regardless of which userspace or architecture is in use.
I think the existing OOM killer has this done right.

The decision to kill a process occurs after the page allocator has tried _very_
hard to satisfy a page allocation via alternative means, such as utilizing
compaction, flushing file-backed pages to disk via kswapd, and direct reclaim.
Once all of those means have failed, it is quite reasonable to kill a process to
free memory. Trying to wait out the memory starvation at this point would be
futile.

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-11 Thread Sultan Alsawaf
On Mon, Mar 11, 2019 at 01:10:36PM -0700, Suren Baghdasaryan wrote:
> The idea seems interesting although I need to think about this a bit
> more. Killing processes based on failed page allocation might backfire
> during transient spikes in memory usage.

This issue could be alleviated if tasks could be killed and have their pages
reaped faster. Currently, Linux takes a _very_ long time to free a task's memory
after an initial privileged SIGKILL is sent to a task, even with the task's
priority being set to the highest possible (so unwanted scheduler preemption
starving dying tasks of CPU time is not the issue at play here). I've
frequently measured the difference in time between when a SIGKILL is sent for a
task and when free_task() is called for that task to be hundreds of
milliseconds, which is incredibly long. AFAIK, this is a problem that LMKD
suffers from as well, and perhaps any OOM killer implementation in Linux, since
you cannot evaluate effect you've had on memory pressure by killing a process
for at least several tens of milliseconds.

> AFAIKT the biggest issue with using this approach in userspace is that
> it's not practically implementable without heavy in-kernel support.
> How to implement such interaction between kernel and userspace would
> be an interesting discussion which I would be happy to participate in.

You could signal a lightweight userspace process that has maximum scheduler
priority and have it kill the tasks it'd like.

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-11 Thread Sultan Alsawaf
On Mon, Mar 11, 2019 at 06:43:20PM +0100, Michal Hocko wrote:
> I am sorry but we are not going to maintain two different OOM
> implementations in the kernel. From a quick look the implementation is
> quite a hack which is not really suitable for anything but a very
> specific usecase. E.g. reusing a freed page for a waiting allocation
> sounds like an interesting idea but it doesn't really work for many
> reasons. E.g. any NUMA affinity is broken, zone protection doesn't work
> either. Not to mention how the code hooks into the allocator hot paths.
> This is simply no no.
> 
> Last but not least people have worked really hard to provide means (PSI)
> to do what you need in the userspace.

Hi Michal,

Thanks for the feedback. I had no doubt that this would be vehemently rejected
on the mailing list, but I wanted feedback/opinions on it and thus sent it as 
anRFC. At best I thought perhaps the mechanisms I've employed might serve as
inspiration for LMKD improvements in Android, since this hacky OOM killer I've
devised does work quite well for the very specific usecase it is set out to
address. The NUMA affinity and zone protection bits are helpful insights too.

I'll take a look at PSI which Joel mentioned as well.

Thanks,
Sultan Alsawaf
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-10 Thread Sultan Alsawaf
On Sun, Mar 10, 2019 at 10:03:35PM +0100, Greg Kroah-Hartman wrote:
> On Sun, Mar 10, 2019 at 01:34:03PM -0700, Sultan Alsawaf wrote:
> > From: Sultan Alsawaf 
> > 
> > This is a complete low memory killer solution for Android that is small
> > and simple. It kills the largest, least-important processes it can find
> > whenever a page allocation has completely failed (right after direct
> > reclaim). Processes are killed according to the priorities that Android
> > gives them, so that the least important processes are always killed
> > first. Killing larger processes is preferred in order to free the most
> > memory possible in one go.
> > 
> > Simple LMK is integrated deeply into the page allocator in order to
> > catch exactly when a page allocation fails and exactly when a page is
> > freed. Failed page allocations that have invoked Simple LMK are placed
> > on a queue and wait for Simple LMK to satisfy them. When a page is about
> > to be freed, the failed page allocations are given priority over normal
> > page allocations by Simple LMK to see if they can immediately use the
> > freed page.
> > 
> > Additionally, processes are continuously killed by failed small-order
> > page allocations until they are satisfied.
> > 
> > Signed-off-by: Sultan Alsawaf 
> 
> Wait, why?  We just removed the in-kernel android memory killer, we
> don't want to add another one back again, right?  Android Go devices
> work just fine with the userspace memory killer code, and those are "low
> memory" by design.
> 
> Why do we need kernel code here at all?
> 
> thanks,
> 
> greg k-h

Hi Greg,

Thanks for replying. It has not been my experience and the experience of many
others that Android's userspace low memory memory killer works "just fine." On
my Pixel 3 XL with a meager 4GB of memory, the userspace killer has had issues
with killing too many processes, which has resulted in a noticeably poor user
experience for all Pixel owners. From the looks of lmkd on the master branch,
there still isn't really any definitive solution for this, aside from a 100ms
delay in between process kills.

I think that the userspace low memory killer is more complex than necessary,
especially since in the kernel we can detect exactly when we run out of memory
and react far more quickly than any userspace daemon.

The original reasoning behind why the old kernel low memory killer was removed
is also a bit vague to me. It just seemed to be abandonware, and all of a sudden
a userspace daemon was touted as the solution.

This driver is like an Android-flavored version of the kernel's oom killer, and
has proven quite effective for me on my Pixel. Processes are killed exactly when
a page allocation fails, so memory use is maximized. There is no complexity to
try and estimate how full memory is either.

Thanks,
Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-03-10 Thread Sultan Alsawaf
From: Sultan Alsawaf 

This is a complete low memory killer solution for Android that is small
and simple. It kills the largest, least-important processes it can find
whenever a page allocation has completely failed (right after direct
reclaim). Processes are killed according to the priorities that Android
gives them, so that the least important processes are always killed
first. Killing larger processes is preferred in order to free the most
memory possible in one go.

Simple LMK is integrated deeply into the page allocator in order to
catch exactly when a page allocation fails and exactly when a page is
freed. Failed page allocations that have invoked Simple LMK are placed
on a queue and wait for Simple LMK to satisfy them. When a page is about
to be freed, the failed page allocations are given priority over normal
page allocations by Simple LMK to see if they can immediately use the
freed page.

Additionally, processes are continuously killed by failed small-order
page allocations until they are satisfied.

Signed-off-by: Sultan Alsawaf 
---
 drivers/android/Kconfig  |  28 
 drivers/android/Makefile |   1 +
 drivers/android/simple_lmk.c | 301 +++
 include/linux/sched.h|   3 +
 include/linux/simple_lmk.h   |  11 ++
 kernel/fork.c|   3 +
 mm/page_alloc.c  |  13 ++
 7 files changed, 360 insertions(+)
 create mode 100644 drivers/android/simple_lmk.c
 create mode 100644 include/linux/simple_lmk.h

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 6fdf2abe4..7469d049d 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -54,6 +54,34 @@ config ANDROID_BINDER_IPC_SELFTEST
  exhaustively with combinations of various buffer sizes and
  alignments.
 
+config ANDROID_SIMPLE_LMK
+   bool "Simple Android Low Memory Killer"
+   depends on !MEMCG
+   ---help---
+ This is a complete low memory killer solution for Android that is
+ small and simple. It is integrated deeply into the page allocator to
+ know exactly when a page allocation hits OOM and exactly when a page
+ is freed. Processes are killed according to the priorities that
+ Android gives them, so that the least important processes are always
+ killed first.
+
+if ANDROID_SIMPLE_LMK
+
+config ANDROID_SIMPLE_LMK_MINFREE
+   int "Minimum MiB of memory to free per reclaim"
+   default "64"
+   help
+ Simple LMK will free at least this many MiB of memory per reclaim.
+
+config ANDROID_SIMPLE_LMK_KILL_TIMEOUT
+   int "Kill timeout in milliseconds"
+   default "50"
+   help
+ Simple LMK will only perform memory reclaim at most once per this
+ amount of time.
+
+endif # if ANDROID_SIMPLE_LMK
+
 endif # if ANDROID
 
 endmenu
diff --git a/drivers/android/Makefile b/drivers/android/Makefile
index c7856e320..7c91293b6 100644
--- a/drivers/android/Makefile
+++ b/drivers/android/Makefile
@@ -3,3 +3,4 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-$(CONFIG_ANDROID_BINDERFS) += binderfs.o
 obj-$(CONFIG_ANDROID_BINDER_IPC)   += binder.o binder_alloc.o
 obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o
+obj-$(CONFIG_ANDROID_SIMPLE_LMK)   += simple_lmk.o
diff --git a/drivers/android/simple_lmk.c b/drivers/android/simple_lmk.c
new file mode 100644
index 0..8a441650a
--- /dev/null
+++ b/drivers/android/simple_lmk.c
@@ -0,0 +1,301 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Sultan Alsawaf .
+ */
+
+#define pr_fmt(fmt) "simple_lmk: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MIN_FREE_PAGES (CONFIG_ANDROID_SIMPLE_LMK_MINFREE * SZ_1M / PAGE_SIZE)
+
+struct oom_alloc_req {
+   struct page *page;
+   struct completion done;
+   struct list_head lh;
+   unsigned int order;
+   int migratetype;
+};
+
+struct victim_info {
+   struct task_struct *tsk;
+   unsigned long size;
+};
+
+enum {
+   DISABLED,
+   STARTING,
+   READY,
+   KILLING
+};
+
+/* Pulled from the Android framework */
+static const short int adj_prio[] = {
+   906, /* CACHED_APP_MAX_ADJ */
+   905, /* Cached app */
+   904, /* Cached app */
+   903, /* Cached app */
+   902, /* Cached app */
+   901, /* Cached app */
+   900, /* CACHED_APP_MIN_ADJ */
+   800, /* SERVICE_B_ADJ */
+   700, /* PREVIOUS_APP_ADJ */
+   600, /* HOME_APP_ADJ */
+   500, /* SERVICE_ADJ */
+   400, /* HEAVY_WEIGHT_APP_ADJ */
+   300, /* BACKUP_APP_ADJ */
+   200, /* PERCEPTIBLE_APP_ADJ */
+   100, /* VISIBLE_APP_ADJ */
+   0/* FOREGROUND_APP_ADJ */
+};
+
+/* Make sure that PID_MAX_DEFAULT isn't too big, or these arrays will be huge 
*/
+static struct victim_info victim_array[PID_MAX_DEFAULT];
+static