[jira] [Commented] (MESOS-9283) Docker containerizer actor can get backlogged with large number of containers.

Andrei Budnik (JIRA) Fri, 12 Oct 2018 06:15:24 -0700


    [ 
https://issues.apache.org/jira/browse/MESOS-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647892#comment-16647892
 ]


Andrei Budnik commented on MESOS-9283:
--------------------------------------

Re-opened this as we need to backport /r/68923 to Mesos 1.4.x

> Docker containerizer actor can get backlogged with large number of containers.
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-9283
>                 URL: https://issues.apache.org/jira/browse/MESOS-9283
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.5.1, 1.6.1, 1.7.0
>            Reporter: Jie Yu
>            Assignee: Greg Mann
>            Priority: Blocker
>              Labels: perfomance
>             Fix For: 1.8.0
>
>         Attachments: Screen Shot 2018-10-01 at 10.54.18 PM.png
>
>
> We observed during some scale testing that we do internally.
> When launching 300+ Docker containers on a single agent box, it's possible 
> that the Docker containerizer actor gets backlogged. As a result, API 
> processing like `GET_CONTAINERS` will become unresponsive. It'll also block 
> Mesos containerizer from launching containers if one specified 
> `--containers=docker,mesos` because Docker containerizer launch will be 
> invoked first by the composing containerizer (and queued).
> Profiling results show that the bottleneck is `os::killtree`, which will be 
> invoked when the Docker commands are discarded (e.g., client disconnect, 
> etc.).
> For this particular case, killtree is not really necessary because the docker 
> command does not fork additional subprocesses. If we use the argv version of 
> `subprocess` to launch docker commands, we can simply use os::kill instead. 
> We confirmed that, by switching to os::kill, the performance issues goes 
> away, and the agent can easily scale up to 300+ containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (MESOS-9283) Docker containerizer actor can get backlogged with large number of containers.

Reply via email to