On 05/03, Eric W. Biederman wrote: > > Oleg Nesterov <o...@redhat.com> writes: > > > On 05/02, Eric W. Biederman wrote: > >> > >> +static void mem_cgroup_fork(struct task_struct *tsk) > >> +{ > >> + struct cgroup_subsys_state *css; > >> + > >> + rcu_read_lock(); > >> + css = task_css(tsk, memory_cgrp_id); > >> + if (css && css_tryget(css)) > >> + task_update_memcg(tsk, mem_cgroup_from_css(css)); > >> + rcu_read_unlock(); > >> +} > > > > Why do we need it? > > > > The child's mm->memcg was already initialized by mm_init_memcg() and we > > can't > > race with migrate until cgroup_threadgroup_change_end() ? > > I admit I missed the cgroup_threadgroup_change_begin > cgroup_threadgroup_change_end pair in fs fork. In this case it doesn't > matter because mm_init_memcg is called from: > > copy_mm > dup_mm > mm_init > > And copy_mm is called before we call cgroup_threadgroup_change_begin. > So the race remains.
Ah yes, you are right. > We could move move cgroup_threadgroup_change_begin earlier, to remove > the need for mem_cgroup_fork. But I have not analyzed that. No, cgroup_threadgroup_change_begin() was called early and this was wrong, see 568ac888215c7fb2fabe8ea739b00ec3c1f5d440. Actually there were more problems, say copy_net() could deadlock because cleanup_net() does do_wait() with net_mutex held. OK, what about exec() ? mm_init_memcg() initializes bprm->mm->memcg early in bprm_mm_init(). What if the execing task migrates before exec_mmap() ? Oleg.