Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-16 Thread Oleg Nesterov
On 05/15, Sultan Alsawaf wrote:
>
> On Wed, May 15, 2019 at 04:58:32PM +0200, Oleg Nesterov wrote:
> > Could you explain in detail what exactly did you do and what do you see in 
> > dmesg?
> >
> > Just in case, lockdep complains only once, print_circular_bug() does 
> > debug_locks_off()
> > so it it has already reported another false positive __lock_acquire() will 
> > simply
> > return after that.
> >
> > Oleg.
>
> This is what I did:
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 774ab79d3ec7..009e7d431a88 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3078,6 +3078,7 @@ static int __lock_acquire(struct lockdep_map *lock, 
> unsigned int subclass,
> int class_idx;
> u64 chain_key;
>
> +   BUG_ON(!debug_locks || !prove_locking);
> if (unlikely(!debug_locks))
> return 0;
>
> diff --git a/lib/debug_locks.c b/lib/debug_locks.c
> index 124fdf238b3d..4003a18420fb 100644
> --- a/lib/debug_locks.c
> +++ b/lib/debug_locks.c
> @@ -37,6 +37,7 @@ EXPORT_SYMBOL_GPL(debug_locks_silent);
>   */
>  int debug_locks_off(void)
>  {
> +   return 0;
> if (debug_locks && __debug_locks_off()) {
> if (!debug_locks_silent) {
> console_verbose();

OK, this means that debug_locks_off() always returns 0, as if debug_locks was 
already
cleared.

Thus print_deadlock_bug() will do nothing, it does

if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;

iow this means that even if lockdep finds a problem, the problem won't be 
reported.

> [1.492128] BUG: key  not in .data!
> [1.492141] BUG: key  not in .data!
> [1.492152] BUG: key  not in .data!
> [1.492228] BUG: key  not in .data!
> [1.492238] BUG: key  not in .data!
> [1.492248] BUG: key  not in .data!

I guess this is lockdep_init_map() which does printk("BUG:") itself, but due to 
your
change above it doesn't do WARN(1) and thus there is no call trace.

Oleg.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-15 Thread Oleg Nesterov
On 05/13, Sultan Alsawaf wrote:
>
> On Fri, May 10, 2019 at 05:10:25PM +0200, Oleg Nesterov wrote:
> > I am starting to think I am ;)
> >
> > If you have task1 != task2 this code
> >
> > task_lock(task1);
> > task_lock(task2);
> >
> > should trigger print_deadlock_bug(), task1->alloc_lock and 
> > task2->alloc_lock are
> > the "same" lock from lockdep pov, held_lock's will have the same 
> > hlock_class().
>
> Okay, I've stubbed out debug_locks_off(), and lockdep is now complaining 
> about a
> bunch of false positives so it is _really_ enabled this time.

Could you explain in detail what exactly did you do and what do you see in 
dmesg?

Just in case, lockdep complains only once, print_circular_bug() does 
debug_locks_off()
so it it has already reported another false positive __lock_acquire() will 
simply
return after that.

Oleg.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-10 Thread Oleg Nesterov
On 05/09, Sultan Alsawaf wrote:
>
> On Thu, May 09, 2019 at 05:56:46PM +0200, Oleg Nesterov wrote:
> > Impossible ;) I bet lockdep should report the deadlock as soon as 
> > find_victims()
> > calls find_lock_task_mm() when you already have a locked victim.
>
> I hope you're not a betting man ;)

I am starting to think I am ;)

If you have task1 != task2 this code

task_lock(task1);
task_lock(task2);

should trigger print_deadlock_bug(), task1->alloc_lock and task2->alloc_lock are
the "same" lock from lockdep pov, held_lock's will have the same hlock_class().

> CONFIG_PROVE_LOCKING=y

OK,

> And a printk added in vtsk_is_duplicate() to print when a duplicate is 
> detected,

in this case find_lock_task_mm() won't be called, and this is what saves us from
the actual deadlock.


> and my phone's memory cut in half to make simple_lmk do something, this is 
> what
> I observed:
> taimen:/ # dmesg | grep lockdep
> [0.00] \x09RCU lockdep checking is enabled.

this reports that CONFIG_PROVE_RCU is enabled ;)

> taimen:/ # dmesg | grep simple_lmk
> [   23.211091] simple_lmk: Killing android.carrier with adj 906 to free 37420 
> kiB
> [   23.211160] simple_lmk: Killing oadcastreceiver with adj 906 to free 36784 
> kiB

yes, looks like simple_lmk has at least 2 locked victims. And I have no idea why
you do not see anything else in dmesg. May be debug_locks_off() was already 
called.

But see above, "grep lockdep" won't work.  Perhaps you can do
"grep -e WARNING -e BUG -e locking".

Oleg.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-09 Thread Oleg Nesterov
On 05/07, Sultan Alsawaf wrote:
>
> On Tue, May 07, 2019 at 05:31:54PM +0200, Oleg Nesterov wrote:
>
> > Did you test this patch with lockdep enabled?
> >
> > If I read the patch correctly, lockdep should complain. vtsk_is_duplicate()
> > ensures that we do not take the same ->alloc_lock twice or more, but lockdep
> > can't know this.
>
> Yeah, lockdep is fine with this, at least on 4.4.

Impossible ;) I bet lockdep should report the deadlock as soon as find_victims()
calls find_lock_task_mm() when you already have a locked victim.

Nevermind, I guess this code won't run with lockdep enabled...


As for 
https://github.com/kerneltoast/android_kernel_google_wahoo/commit/afc8c9bf2dbde95941253c168d1adb64cfa2e3ad
Well,

mmdrop(mm);
simple_lmk_mm_freed(mm);

looks racy because mmdrop(mm) can free this mm_struct. Yes, 
simple_lmk_mm_freed()
does not dereference this pointer, but the same memory can be re-allocated as
another ->mm for the new task which can be found by find_victims(), and _in 
theory_
this all can happen in between, so the "victims[i].mm == mm" can be false 
positive.

And this also means that simple_lmk_mm_freed() should clear victims[i].mm when
it detects "victims[i].mm == mm", otherwise we have the same theoretical race,
victims_to_kill is only cleared when the last victim goes away.


Another nit... you can drop tasklist_lock right after the 1st "find_victims" 
loop.

And it seems that you do not really need to walk the "victims" array twice 
after that,
you can do everything in a single loop, but this is cosmetic.

Oleg.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Oleg Nesterov
I am not going to comment the intent, but to be honest I am skeptical too.

On 05/06, Sultan Alsawaf wrote:
>
> +static unsigned long find_victims(struct victim_info *varr, int *vindex,
> +   int vmaxlen, int min_adj, int max_adj)
> +{
> + unsigned long pages_found = 0;
> + int old_vindex = *vindex;
> + struct task_struct *tsk;
> +
> + for_each_process(tsk) {
> + struct task_struct *vtsk;
> + unsigned long tasksize;
> + short oom_score_adj;
> +
> + /* Make sure there's space left in the victim array */
> + if (*vindex == vmaxlen)
> + break;
> +
> + /* Don't kill current, kthreads, init, or duplicates */
> + if (same_thread_group(tsk, current) ||
> + tsk->flags & PF_KTHREAD ||
> + is_global_init(tsk) ||
> + vtsk_is_duplicate(varr, *vindex, tsk))
> + continue;
> +
> + vtsk = find_lock_task_mm(tsk);

Did you test this patch with lockdep enabled?

If I read the patch correctly, lockdep should complain. vtsk_is_duplicate()
ensures that we do not take the same ->alloc_lock twice or more, but lockdep
can't know this.

> +static void scan_and_kill(unsigned long pages_needed)
> +{
> + static DECLARE_WAIT_QUEUE_HEAD(victim_waitq);
> + struct victim_info victims[MAX_VICTIMS];
> + int i, nr_to_kill = 0, nr_victims = 0;
> + unsigned long pages_found = 0;
> + atomic_t victim_count;
> +
> + /*
> +  * Hold the tasklist lock so tasks don't disappear while scanning. This
> +  * is preferred to holding an RCU read lock so that the list of tasks
> +  * is guaranteed to be up to date. Keep preemption disabled until the
> +  * SIGKILLs are sent so the victim kill process isn't interrupted.
> +  */
> + read_lock(_lock);
> + preempt_disable();

read_lock() disables preemption, every task_lock() too, so this looks
unnecessary.

> + for (i = 1; i < ARRAY_SIZE(adj_prio); i++) {
> + pages_found += find_victims(victims, _victims, MAX_VICTIMS,
> + adj_prio[i], adj_prio[i - 1]);
> + if (pages_found >= pages_needed || nr_victims == MAX_VICTIMS)
> + break;
> + }
> +
> + /*
> +  * Calculate the number of tasks that need to be killed and quickly
> +  * release the references to those that'll live.
> +  */
> + for (i = 0, pages_found = 0; i < nr_victims; i++) {
> + struct victim_info *victim = [i];
> + struct task_struct *vtsk = victim->tsk;
> +
> + /* The victims' mm lock is taken in find_victims; release it */
> + if (pages_found >= pages_needed) {
> + task_unlock(vtsk);
> + continue;
> + }
> +
> + /*
> +  * Grab a reference to the victim so it doesn't disappear after
> +  * the tasklist lock is released.
> +  */
> + get_task_struct(vtsk);

The comment doesn't look correct. the victim can't dissapear until task_unlock()
below, it can't pass exit_mm().

> + pages_found += victim->size;
> + nr_to_kill++;
> + }
> + read_unlock(_lock);
> +
> + /* Kill the victims */
> + victim_count = (atomic_t)ATOMIC_INIT(nr_to_kill);
> + for (i = 0; i < nr_to_kill; i++) {
> + struct victim_info *victim = [i];
> + struct task_struct *vtsk = victim->tsk;
> +
> + pr_info("Killing %s with adj %d to free %lu kiB\n", vtsk->comm,
> + vtsk->signal->oom_score_adj,
> + victim->size << (PAGE_SHIFT - 10));
> +
> + /* Configure the victim's mm to notify us when it's freed */
> + vtsk->mm->slmk_waitq = _waitq;
> + vtsk->mm->slmk_counter = _count;
> +
> + /* Accelerate the victim's death by forcing the kill signal */
> + do_send_sig_info(SIGKILL, SIG_INFO_TYPE, vtsk, true);
   
this should be PIDTYPE_TGID

> +
> + /* Finally release the victim's mm lock */
> + task_unlock(vtsk);
> + }
> + preempt_enable_no_resched();

See above. And I don't understand how can _no_resched() really help...

> +
> + /* Try to speed up the death process now that we can schedule again */
> + for (i = 0; i < nr_to_kill; i++) {
> + struct task_struct *vtsk = victims[i].tsk;
> +
> + /* Increase the victim's priority to make it die faster */
> + set_user_nice(vtsk, MIN_NICE);
> +
> + /* Allow the victim to run on any CPU */
> + set_cpus_allowed_ptr(vtsk, cpu_all_mask);
> +
> + /* Finally release the victim reference acquired earlier */
> + put_task_struct(vtsk);
> + }
> +
> + /* Wait