On 2017-03-13 06:28, Benoit GEORGELIN - Association Web4all wrote:
Hi lxc-users ,

I would like to know if you have any experience with a large number of
LXC/LXD containers ?
In term of performance, stability and limitation .

I'm wondering for exemple, if having 100 containers behave the same of
having 1.000 or 10.000  with the same configuration to avoid to talk
about container usage.

I'm running LXD on several servers and I'm generally satisfied with it - performance, stability are fine. They are mostly <50 containers though.

I also have a LXD server which runs 100+ containers, which starts/stops/deletes dozens of containers daily and is used for automation. Approximately once every 1-2 months, "lxc stop" / "lxc restart" command will fail, which is a bit of stability concern for us.

The cause is unclear. In LXD log for the container, the only thing logged is:

lxc 20170301115514.738 WARN lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to receive response: Connection reset by peer.

When it starts to happen, it affects all containers - "lxc stop / lxc restart" will hang for any of the running containers. What's interesting, the container gets stopped with "lxc stop", the command just never returns. For "lxc restart" case, it will just stop the container (and the command will not return / will not start the container again).

The only thing which fixes that is server restart.

There is also no clear way to reproduce it reliably (other than running the server for long, and starting/stopping a large number of containers over that time...).

I think it's some kernel issue, but unfortunately I was not able to debug this any further.

Tomasz Chmielewski
lxc-users mailing list

Reply via email to