[ 
https://issues.apache.org/jira/browse/MESOS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880624#comment-13880624
 ] 

Vinod Kone commented on MESOS-941:
----------------------------------

https://reviews.apache.org/r/17295/

> Memory limit not correctly set when no memory resource set on executor level
> ----------------------------------------------------------------------------
>
>                 Key: MESOS-941
>                 URL: https://issues.apache.org/jira/browse/MESOS-941
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>            Reporter: Lin Zhao
>            Assignee: Vinod Kone
>             Fix For: 0.17.0
>
>
> When a framework is launched with memory resource only set on the tasks, and 
> non set on the executor level, the slave fails to apply the memory control 
> needed to limit memory usage for the executor. The executor process can use 
> more resident memory than specified in the tasks.
> Example framework: https://gist.github.com/lin-zhao/8544495. This framework 
> was tested with Mesos 0.14.2 on Centos 6, kernel 3.10.11-1.el6.x86_64. 
> According to Benjamin Mahler:
> What's happening is that you're launching an executor with no resources, 
> consequently before we fork, we attempt to update the memory control but we 
> don't call the memory handler since the executor has no memory resources:
> I0121 19:39:01.660071  8566 cgroups_isolator.cpp:516] Launching default 
> (/home/lin/test-executor) in 
> /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
>  with resources  for framework 201401171812-2907575306-5050-19011-0020 in 
> cgroup 
> mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
> I0121 19:39:01.663082  8566 cgroups_isolator.cpp:709] Changing cgroup 
> controls for executor default of framework 
> 201401171812-2907575306-5050-19011-0020 with resources 
> I0121 19:39:01.667129  8566 cgroups_isolator.cpp:1163] Started listening for 
> OOM events for executor default of framework 
> 201401171812-2907575306-5050-19011-0020
> I0121 19:39:01.681857  8566 cgroups_isolator.cpp:568] Forked executor at = 
> 27609
> Then, later, when we are updating the resources for your 128MB task, we set 
> the soft limit, but we don't set the hard limit because the following buggy 
> check is not satisfied:
>   // Determine whether to set the hard limit. If this is the first
>   // time (info->pid.isNone()), or we're raising the existing limit,
>   // then we can update the hard limit safely. Otherwise, if we need
>   // to decrease 'memory.limit_in_bytes' we may induce an OOM if too
>   // much memory is in use. As a result, we only update the soft
>   // limit when the memory reservation is being reduced. This is
>   // probably okay if the machine has available resources.
>   // TODO(benh): Introduce a MemoryWatcherProcess which monitors the
>   // discrepancy between usage and soft limit and introduces a
>   // "manual oom" if necessary.
>   if (info->pid.isNone() || limit > currentLimit.get()) {



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to