[
https://issues.apache.org/jira/browse/MESOS-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900567#comment-16900567
]
Qian Zhang commented on MESOS-9925:
-----------------------------------
The root cause of this issue is, usually in Marathon or mesos-execute we only
give default executor 0.1 cpus which is a bit small for default executor to
start. In executor container’s CPU cgroup, I see this:
{code:java}
$ cat
/sys/fs/cgroup/cpuacct/mesos/bd5bc588-7565-4c7e-a5f0-d33850b2ec0a/cpu.stat
nr_periods 118
nr_throttled 37
throttled_time 633829202{code}
`nr_throttled 37` means the container was throttled. If I change the default
executor’s CPU from 0.1 to 1.0 or change `--cgroups_enable_cfs` to false, we
will not have this issue, i.e., nr_throttled will be 0 and the default executor
will be started and subscribed very quickly (~0.5s).
> Default executor takes a couple of seconds to start and subscribe Mesos agent
> -----------------------------------------------------------------------------
>
> Key: MESOS-9925
> URL: https://issues.apache.org/jira/browse/MESOS-9925
> Project: Mesos
> Issue Type: Bug
> Components: containerization
> Reporter: Qian Zhang
> Priority: Major
> Labels: containerization
>
> When launching a task group, it may take 6 seconds for default executor to
> start and subscribe Mesos agent:
> {code:java}
> # Agent log:
> I0730 01:18:57.908911 10107 containerizer.cpp:3302] Transitioning the state
> of container 593f6750-e36d-4838-89c7-34c77b30ba99 from FETCHING to RUNNING
> I0730 01:19:03.829246 10073 http.cpp:1115] HTTP POST for
> /slave(1)/api/v1/executor from 10.0.49.2:36798
> # Executor stderr:
> Marked '/' as rslave
> I0730 01:19:03.617830 10438 executor.cpp:206] Version: 1.9.0
> I0730 01:19:03.842535 10464 default_executor.cpp:205] Received SUBSCRIBED
> event
> {code}
> This is obviously too long which may affect the performance of launching task
> groups.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)