[ https://issues.apache.org/jira/browse/MESOS-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637831#comment-16637831 ]
Greg Mann commented on MESOS-9283: ---------------------------------- Update for {{Docker::inspect()}}, which I'm planning to backport: https://reviews.apache.org/r/68923/ Update for the entire Docker library; not planning to backport: https://reviews.apache.org/r/68924/ > Docker containerizer actor can get backlogged with large number of containers. > ------------------------------------------------------------------------------ > > Key: MESOS-9283 > URL: https://issues.apache.org/jira/browse/MESOS-9283 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 1.5.1, 1.6.1, 1.7.0 > Reporter: Jie Yu > Assignee: Greg Mann > Priority: Major > Labels: perfomance > Attachments: Screen Shot 2018-10-01 at 10.54.18 PM.png > > > We observed during some scale testing that we do internally. > When launching 300+ Docker containers on a single agent box, it's possible > that the Docker containerizer actor gets backlogged. As a result, API > processing like `GET_CONTAINERS` will become unresponsive. It'll also block > Mesos containerizer from launching containers if one specified > `--containers=docker,mesos` because Docker containerizer launch will be > invoked first by the composing containerizer (and queued). > Profiling results show that the bottleneck is `os::killtree`, which will be > invoked when the Docker commands are discarded (e.g., client disconnect, > etc.). > For this particular case, killtree is not really necessary because the docker > command does not fork additional subprocesses. If we use the argv version of > `subprocess` to launch docker commands, we can simply use os::kill instead. > We confirmed that, by switching to os::kill, the performance issues goes > away, and the agent can easily scale up to 300+ containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)