On 08/21, Oleg Nesterov wrote: > > On 08/21, Suren Baghdasaryan wrote: > > > > On Fri, Aug 21, 2020 at 4:16 AM Oleg Nesterov <o...@redhat.com> wrote: > > > > > > bool probably_has_other_mm_users(tsk) > > > { > > > return atomic_read_acquire(&tsk->mm->mm_users) > > > > atomic_read(&tsk->signal->live); > > > } > > > > > > The barrier implied by _acquire ensures that if we race with the exiting > > > task and see the result of exit_mm()->mmput(mm), then we must also see > > > the result of atomic_dec_and_test(signal->live). > > > > > > Either way, if we want to fix the race with clone(CLONE_VM) we need other > > > changes. > > > > The way I understand this condition in __set_oom_adj() sync logic is > > that we would be ok with false positives (when we loop unnecessarily) > > but we can't tolerate false negatives (when oom_score_adj gets out of > > sync). > > Yes, > > > With the clone(CLONE_VM) race not addressed we are allowing > > false negatives and IMHO that's not acceptable because it creates a > > possibility for userspace to get an inconsistent picture. When > > developing the patch I did think about using (p->mm->mm_users > > > p->signal->nr_threads) condition and had to reject it due to that > > reason. > > Not sure I understand... I mean, the test_bit(MMF_PROC_SHARED) you propose > is equally racy and we need copy_oom_score() at the end of copy_process() > either way?
On a second thought I agree that probably_has_other_mm_users() above can't work ;) Compared to the test_bit(MMF_PROC_SHARED) check it is not _equally_ racy, it adds _another_ race with clone(CLONE_VM). Suppose a single-threaded process P does clone(CLONE_VM); // creates the child C // mm_users == 2; P->signal->live == 1; clone(CLONE_THREAD | CLONE_VM); // mm_users == 3; P->signal->live == 2; the problem is that in theory clone(CLONE_THREAD | CLONE_VM) can increment _both_ counters between atomic_read_acquire(mm_users) and atomic_read(live) in probably_has_other_mm_users() so it can observe mm_users == live == 2. Oleg.