Oleg Nesterov <o...@redhat.com> writes: > On 05/03, Eric W. Biederman wrote: >> >> Oleg Nesterov <o...@redhat.com> writes: >> >> > On 05/02, Eric W. Biederman wrote: >> >> >> >> +static void mem_cgroup_fork(struct task_struct *tsk) >> >> +{ >> >> + struct cgroup_subsys_state *css; >> >> + >> >> + rcu_read_lock(); >> >> + css = task_css(tsk, memory_cgrp_id); >> >> + if (css && css_tryget(css)) >> >> + task_update_memcg(tsk, mem_cgroup_from_css(css)); >> >> + rcu_read_unlock(); >> >> +} >> > >> > Why do we need it? >> > >> > The child's mm->memcg was already initialized by mm_init_memcg() and we >> > can't >> > race with migrate until cgroup_threadgroup_change_end() ? >> >> I admit I missed the cgroup_threadgroup_change_begin >> cgroup_threadgroup_change_end pair in fs fork. In this case it doesn't >> matter because mm_init_memcg is called from: >> >> copy_mm >> dup_mm >> mm_init >> >> And copy_mm is called before we call cgroup_threadgroup_change_begin. >> So the race remains. > > Ah yes, you are right. > >> We could move move cgroup_threadgroup_change_begin earlier, to remove >> the need for mem_cgroup_fork. But I have not analyzed that. > > No, cgroup_threadgroup_change_begin() was called early and this was wrong, see > 568ac888215c7fb2fabe8ea739b00ec3c1f5d440. Actually there were more problems, > say > copy_net() could deadlock because cleanup_net() does do_wait() with net_mutex > held. > > > OK, what about exec() ? mm_init_memcg() initializes bprm->mm->memcg early in > bprm_mm_init(). What if the execing task migrates before exec_mmap() ?
We need the the cgroup when the mm is initialized. That way we have the cgroup information when initializing the mm. I don't know if a lock preventing changing the cgroup in exec or just a little bit of code in exec_mmap to ensure mm->memcg is properly set is the better approach. I have not analyzed that code path. This does look like a very good place for an incremental patch to close that race. Eric