Oleg Nesterov <o...@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <o...@redhat.com> writes:
>>
>> > I'd vote for the change in exec_mmap(). This way mm_init_memcg() can just
>> > nullify mm->memcg.
>>
>> There is at least one common path where we need the memory control group
>> properly initialized so memory allocations don't escape the memory
>> control group.
>>
>> do_execveat_common
>>    copy_strings
>>       get_arg_page
>>          get_user_pages_remote
>>             __get_user_pages_locked
>>                __get_user_pages
>>                   faultin_page
>>                      handle_mm_fault
>>                         count_memcg_event_mm
>>                         __handle_mm_fault
>>                           handle_pte_fault
>>                              do_anonymous_page
>>                                 mem_cgroup_try_charge
>>
>> I am surprised I can't easily find more.   Apparently in load_elf_binary
>> we call elf_mmap after set_new_exec and install_exec_creds, making
>> a gracefull recovery from elf_mmap failures impossible.
>>
>> In any case we most definitely need the memory control group properly
>> setup before exec_mmap.
>
> Confused ...
>
> new_mm->memcg has no effect until exec_mmap(), why it can't be NULL ?

new_mm->memcg does have effect before exec_mmap.  

> and why do you think mem_cgroup_try_charge() can use the wrong memcg
> in this case?

It would only use the wrong memcg if you had bprm->mm->memcg == NULL.

mm_init_memcg is at the same point as mm_init_owner.  So my change did
not introduce any logic changes on when the memory control group became
valid.

get_user_pages_remote is passed bprm->mm.  So all of the above happens
on bprm->mm.  Then bprm->mm->memcg is charged and fails if the memory
control group is full.


It would actually be pointless to allocate and initialized bprm->mm as
early as we do if we were not using it before exec_mmap.  Using bprm->mm
implies there will be charges to the bprm->mm->memcg.

If bprm->mm was not used until exec_mmap we could just delay the
allocation until there and avoid these kind of challenges. 

Eric

Reply via email to