On Fri, Feb 16, 2018 at 05:17:57PM +0000, Mathieu Desnoyers wrote: > ----- On Feb 16, 2018, at 11:53 AM, Mark Rutland mark.rutl...@arm.com wrote: > > I suspect we have a bogus mmdrop or mmput elsewhere, and do_exit() and > > finish_task_switch() aren't to blame. > > Currently reviewing: fs/proc/base.c: __set_oom_adj() > > /* > * Make sure we will check other processes sharing the mm if this is > * not vfrok which wants its own oom_score_adj. > * pin the mm so it doesn't go away and get reused after task_unlock > */ > if (!task->vfork_done) { > struct task_struct *p = find_lock_task_mm(task); > > if (p) { > if (atomic_read(&p->mm->mm_users) > 1) { > mm = p->mm; > mmgrab(mm); > } > task_unlock(p); > } > } > > Considering that mmput() done by exit_mm() is done outside of the > task_lock critical section, I wonder how the mm_users load is > synchronized ?
That looks suspicious, but I don't think it can result in this particular problem. In find_lock_task_mm() we get the task lock, and check mm != NULL, which means that mm->mm_count >= 1 (thanks to the implicit reference context_switch()+finish_task_switch() manage). While we hold the task lock, task->mm can't change beneath our feet, and hence that reference can't be dropped by finish_task_switch(). Thus, immediately after the mmgrab(), we know mm->mm_count >= 2. That shouldn't drop below 1 until the subsequent mmdrop(), even after we drop the task lock, unless there's a misplaced mmdrop() elsewhere. Locally, mmgrab() and mmdrop() are balanced. However, if mm_users can be incremented behind our back, we might skip updating the oom adjustments for other users of the mm. Thanks, Mark.