[jira] [Commented] (MESOS-9925) Default executor takes a couple of seconds to start and subscribe Mesos agent

Qian Zhang (JIRA) Mon, 05 Aug 2019 20:14:10 -0700


    [ 
https://issues.apache.org/jira/browse/MESOS-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900567#comment-16900567
 ]


Qian Zhang commented on MESOS-9925:
-----------------------------------

The root cause of this issue is, usually in Marathon or mesos-execute we only 
give default executor 0.1 cpus which is a bit small for default executor to 
start. In executor container’s CPU cgroup, I see this:
{code:java}
$ cat 
/sys/fs/cgroup/cpuacct/mesos/bd5bc588-7565-4c7e-a5f0-d33850b2ec0a/cpu.stat 
nr_periods 118
nr_throttled 37
throttled_time 633829202{code}
`nr_throttled 37` means the container was throttled. If I change the default 
executor’s CPU from 0.1 to 1.0 or change `--cgroups_enable_cfs` to false, we 
will not have this issue, i.e., nr_throttled will be 0 and the default executor 
will be started and subscribed very quickly (~0.5s).

> Default executor takes a couple of seconds to start and subscribe Mesos agent
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-9925
>                 URL: https://issues.apache.org/jira/browse/MESOS-9925
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Qian Zhang
>            Priority: Major
>              Labels: containerization
>
> When launching a task group, it may take 6 seconds for default executor to 
> start and subscribe Mesos agent:
> {code:java}
> # Agent log:
> I0730 01:18:57.908911 10107 containerizer.cpp:3302] Transitioning the state 
> of container 593f6750-e36d-4838-89c7-34c77b30ba99 from FETCHING to RUNNING
> I0730 01:19:03.829246 10073 http.cpp:1115] HTTP POST for 
> /slave(1)/api/v1/executor from 10.0.49.2:36798
> # Executor stderr:
> Marked '/' as rslave
> I0730 01:19:03.617830 10438 executor.cpp:206] Version: 1.9.0
> I0730 01:19:03.842535 10464 default_executor.cpp:205] Received SUBSCRIBED 
> event
> {code}
> This is obviously too long which may affect the performance of launching task 
> groups.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (MESOS-9925) Default executor takes a couple of seconds to start and subscribe Mesos agent

Reply via email to