Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-04-06 Thread Tomasz Chmielewski

On 2017-04-07 06:41, Serge E. Hallyn wrote:


would you mind opening an issue for this at github.com/lxc/lxd/issues?
Just add in all the info you have and, if I understand right that you
can't put time into further reproductions, just say so up top so
hopefully we won't bug you too much.


Here it is:

https://github.com/lxc/lxd/issues/3159


I can try reproducing that if you have any ideas how to do it.

And/or, what exactly to run if it hangs again to get some more debugging 
- note I'll have to run it relatively quickly, then will have to restart 
the server - meaning, most likely no time for any interaction on the 
mailing list / github.



Tomasz Chmielewski
https://lxadm.com
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-04-06 Thread Serge E. Hallyn
Quoting Tomasz Chmielewski (man...@wpkg.org):
> On 2017-03-13 06:28, Benoit GEORGELIN - Association Web4all wrote:
> >Hi lxc-users ,
> >
> >I would like to know if you have any experience with a large number of
> >LXC/LXD containers ?
> >In term of performance, stability and limitation .
> >
> >I'm wondering for exemple, if having 100 containers behave the same of
> >having 1.000 or 10.000  with the same configuration to avoid to talk
> >about container usage.
> 
> I'm running LXD on several servers and I'm generally satisfied with
> it - performance, stability are fine. They are mostly <50 containers
> though.
> 
> I also have a LXD server which runs 100+ containers, which
> starts/stops/deletes dozens of containers daily and is used for
> automation. Approximately once every 1-2 months, "lxc stop" / "lxc
> restart" command will fail, which is a bit of stability concern for
> us.
> 
> The cause is unclear. In LXD log for the container, the only thing
> logged is:
> 
> 
> lxc 20170301115514.738 WARN lxc_commands -
> commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to
> receive response: Connection reset by peer.
> 
> 
> When it starts to happen, it affects all containers - "lxc stop /
> lxc restart" will hang for any of the running containers. What's
> interesting, the container gets stopped with "lxc stop", the command
> just never returns. For "lxc restart" case, it will just stop the
> container (and the command will not return / will not start the
> container again).
> 
> The only thing which fixes that is server restart.
> 
> There is also no clear way to reproduce it reliably (other than
> running the server for long, and starting/stopping a large number of
> containers over that time...).
> 
> I think it's some kernel issue, but unfortunately I was not able to
> debug this any further.

Hi,

would you mind opening an issue for this at github.com/lxc/lxd/issues?
Just add in all the info you have and, if I understand right that you
can't put time into further reproductions, just say so up top so
hopefully we won't bug you too much.
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-04-05 Thread Tomasz Chmielewski

On 2017-03-13 06:28, Benoit GEORGELIN - Association Web4all wrote:

Hi lxc-users ,

I would like to know if you have any experience with a large number of
LXC/LXD containers ?
In term of performance, stability and limitation .

I'm wondering for exemple, if having 100 containers behave the same of
having 1.000 or 10.000  with the same configuration to avoid to talk
about container usage.


I'm running LXD on several servers and I'm generally satisfied with it - 
performance, stability are fine. They are mostly <50 containers though.


I also have a LXD server which runs 100+ containers, which 
starts/stops/deletes dozens of containers daily and is used for 
automation. Approximately once every 1-2 months, "lxc stop" / "lxc 
restart" command will fail, which is a bit of stability concern for us.


The cause is unclear. In LXD log for the container, the only thing 
logged is:



lxc 20170301115514.738 WARN lxc_commands - 
commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to receive 
response: Connection reset by peer.



When it starts to happen, it affects all containers - "lxc stop / lxc 
restart" will hang for any of the running containers. What's 
interesting, the container gets stopped with "lxc stop", the command 
just never returns. For "lxc restart" case, it will just stop the 
container (and the command will not return / will not start the 
container again).


The only thing which fixes that is server restart.

There is also no clear way to reproduce it reliably (other than running 
the server for long, and starting/stopping a large number of containers 
over that time...).


I think it's some kernel issue, but unfortunately I was not able to 
debug this any further.




Tomasz Chmielewski
https://lxadm.com
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-04-04 Thread Benoit GEORGELIN - Association Web4all
- Mail original -
> De: "Benoit GEORGELIN, web4all" <benoit.george...@web4all.fr>
> À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> Envoyé: Mardi 28 Mars 2017 11:20:48
> Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers

> - Mail original -
> > De: "David Favor" <da...@davidfavor.com>
> > À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> > Envoyé: Lundi 27 Mars 2017 12:55:09
> > Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers

> > Serge E. Hallyn wrote:
> >> On Tue, Mar 14, 2017 at 02:29:01AM +0100, Benoit GEORGELIN – Association 
> >> Web4all
> >> wrote:
> >>> - Mail original -
> >>>> De: “Simos Xenitellis” <simos.li...@googlemail.com> À: “lxc-users”
> >>>> <lxc-users@lists.linuxcontainers.org> Envoyé: Lundi 13 Mars 2017 20:22:03
> >>>> Objet: Re: [lxc-users] Experience with large number of LXC/LXD 
> >>>> containers On
> >>>> Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN – Association Web4all
> >>>> <benoit.george...@web4all.fr> wrote:
> >>>>> Hi lxc-users , I would like to know if you have any experience with a 
> >>>>> large
> >>>>> number of LXC/LXD containers ? In term of performance, stability and 
> >>>>> limitation
> >>>>> . I'm wondering for exemple, if having 100 containers behave the same 
> >>>>> of having
> >>>>> 1.000 or 10.000 with the same configuration to avoid to talk about 
> >>>>> container
> >>>>> usage. I have been looking around for a couple of days to found any 
> >>>>> user/admin
> >>>>> feedback experience but i'm not able to find large deployments Is there 
> >>>>> any
> >>>>> ressources limits or any maximum number that can be deployed on the 
> >>>>> same node ?
> >>>>> Beside physical performance of the node, is there any specific behavior 
> >>>>> that a
> >>>>> large number of LXC/LXD containers can experience ? I'm not aware of 
> >>>>> any test
> >>>>> or limits that can occurs beside number of process. But I'm sure from 
> >>>>> LXC/LXD
> >>>>> side it might have some technical contraints ? Maybe on namespace 
> >>>>> availability
> >>>>> , or any other technical layer used by LXC/LXD I will be interested to 
> >>>>> here
> >>>>> from your experience or if you have any links/books/story about this 
> >>>>> large
> >>>>> deployments
> >>>> This would be interesting to hear if someone can talk publicly about 
> >>>> their large
> >>>> deployment. In any case, it should be possible to create, for example, 
> >>>> 1000 web
> >>>> servers and then try to access each one and check any issues regarding 
> >>>> the
> >>>> response time. Another test would be to install 1000 Wordpress 
> >>>> installations
> >>>> and check again for the response time and resource usage. Such scripts to
> >>>> create this massive number of containers would also be helpful to 
> >>>> replicate any
> >>>> issues in order to solve them. Simos
> > Been reading this + here's a bit of info.

> > I've been running LXC since early deployment + now LXD.

> > There are a few big performance killers related to WordPress. If you keep 
> > these
> > issues in mind, you'll be good.

> > 1) I run 100s of sites across many containers on many machines.
> > My business is private, high speed hosting, so I eat from my efforts.
> > No theory here.
> > I target WordPress site speed at 3000+ reqs/second, measured locally
> > using ab (ApacheBench). This is a crude tool + sufficient, as I issue
> > 1,000,000 simultaneous 5 thread connections against a server for 30 seconds.
> > ab -k -t 30 -n 1000 -c 5 $URL
> > This will crash most machines, unless they're tuned well.

> > 2) Memory + CPU. The big killer of performance anywhere is swap thrash. If 
> > top
> > shows swapping for more than a few seconds, likely your system is heading
> > toward a crash.
> > Fix: I tend to deploy OVH machines with 128G of memory, as this is enough
> > memory to handle huge spikes of memory usage across many sites, during
> &

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-03-28 Thread Benoit GEORGELIN - Association Web4all
- Mail original -
> De: "David Favor" <da...@davidfavor.com>
> À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> Envoyé: Lundi 27 Mars 2017 12:55:09
> Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers

> Serge E. Hallyn wrote:
>> On Tue, Mar 14, 2017 at 02:29:01AM +0100, Benoit GEORGELIN – Association 
>> Web4all
>> wrote:
>>> - Mail original -
>>>> De: “Simos Xenitellis” <simos.li...@googlemail.com> À: “lxc-users”
>>>> <lxc-users@lists.linuxcontainers.org> Envoyé: Lundi 13 Mars 2017 20:22:03
>>>> Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers 
>>>> On
>>>> Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN – Association Web4all
>>>> <benoit.george...@web4all.fr> wrote:
>>>>> Hi lxc-users , I would like to know if you have any experience with a 
>>>>> large
>>>>> number of LXC/LXD containers ? In term of performance, stability and 
>>>>> limitation
>>>>> . I'm wondering for exemple, if having 100 containers behave the same of 
>>>>> having
>>>>> 1.000 or 10.000 with the same configuration to avoid to talk about 
>>>>> container
>>>>> usage. I have been looking around for a couple of days to found any 
>>>>> user/admin
>>>>> feedback experience but i'm not able to find large deployments Is there 
>>>>> any
>>>>> ressources limits or any maximum number that can be deployed on the same 
>>>>> node ?
>>>>> Beside physical performance of the node, is there any specific behavior 
>>>>> that a
>>>>> large number of LXC/LXD containers can experience ? I'm not aware of any 
>>>>> test
>>>>> or limits that can occurs beside number of process. But I'm sure from 
>>>>> LXC/LXD
>>>>> side it might have some technical contraints ? Maybe on namespace 
>>>>> availability
>>>>> , or any other technical layer used by LXC/LXD I will be interested to 
>>>>> here
>>>>> from your experience or if you have any links/books/story about this large
>>>>> deployments
>>>> This would be interesting to hear if someone can talk publicly about their 
>>>> large
>>>> deployment. In any case, it should be possible to create, for example, 
>>>> 1000 web
>>>> servers and then try to access each one and check any issues regarding the
>>>> response time. Another test would be to install 1000 Wordpress 
>>>> installations
>>>> and check again for the response time and resource usage. Such scripts to
>>>> create this massive number of containers would also be helpful to 
>>>> replicate any
>>>> issues in order to solve them. Simos
> Been reading this + here's a bit of info.

> I've been running LXC since early deployment + now LXD.

> There are a few big performance killers related to WordPress. If you keep 
> these
> issues in mind, you'll be good.

> 1) I run 100s of sites across many containers on many machines.
> My business is private, high speed hosting, so I eat from my efforts.
> No theory here.
> I target WordPress site speed at 3000+ reqs/second, measured locally
> using ab (ApacheBench). This is a crude tool + sufficient, as I issue
> 1,000,000 simultaneous 5 thread connections against a server for 30 seconds.
> ab -k -t 30 -n 1000 -c 5 $URL
> This will crash most machines, unless they're tuned well.

> 2) Memory + CPU. The big killer of performance anywhere is swap thrash. If top
> shows swapping for more than a few seconds, likely your system is heading
> toward a crash.
> Fix: I tend to deploy OVH machines with 128G of memory, as this is enough
> memory to handle huge spikes of memory usage across many sites, during
> traffic spikes... then recover...
> For example, running 100s of sites across many LXD containers, I've had
> machines sustain 250,000+ reqs/hour every day for months.
> At these traffic levels, <1 core used sustained + 50%ish memory use.
> Sites still show 3000+ reqs/sec using ab test above.

> 3) Database: I run MariaDB rather than MySQL as it's smokin' fast.
> I also relocate /tmp to tmpfs, so temp file i/o runs at memory speed,
> rather than disk speed.
> This ensures all MariaDB temp select set files (for complex selects)
> generate + access at memory speed.
> Also PHP session /tmp files run at memory speed.
> This is important to me, as many of my clients run large members

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-03-27 Thread David Favor

Serge E. Hallyn wrote:

On Tue, Mar 14, 2017 at 02:29:01AM +0100, Benoit GEORGELIN - Association 
Web4all wrote:

- Mail original -

De: "Simos Xenitellis" <simos.li...@googlemail.com>
À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
Envoyé: Lundi 13 Mars 2017 20:22:03
Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers
On Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN - Association
Web4all <benoit.george...@web4all.fr> wrote:

Hi lxc-users ,
I would like to know if you have any experience with a large number of
LXC/LXD containers ?
In term of performance, stability and limitation .
I'm wondering for exemple, if having 100 containers behave the same of
having 1.000 or 10.000 with the same configuration to avoid to talk about
container usage.
I have been looking around for a couple of days to found any user/admin
feedback experience but i'm not able to find large deployments
Is there any ressources limits or any maximum number that can be deployed on
the same node ?
Beside physical performance of the node, is there any specific behavior that
a large number of LXC/LXD containers can experience ? I'm not aware of any
test or limits that can occurs beside number of process. But I'm sure from
LXC/LXD side it might have some technical contraints ?
Maybe on namespace availability , or any other technical layer used by
LXC/LXD
I will be interested to here from your experience or if you have any
links/books/story about this large deployments



This would be interesting to hear if someone can talk publicly about
their large deployment.
In any case, it should be possible to create, for example, 1000 web servers
and then try to access each one and check any issues regarding the
response time.
Another test would be to install 1000 Wordpress installations and
check again for the response time
and resource usage.
Such scripts to create this massive number of containers would also be
helpful to replicate
any issues in order to solve them.
Simos


Been reading this + here's a bit of info.

I've been running LXC since early deployment + now LXD.

There are a few big performance killers related to WordPress. If you keep
these issues in mind, you'll be good.

1) I run 100s of sites across many containers on many machines.

   My business is private, high speed hosting, so I eat from my efforts.
   No theory here.

   I target WordPress site speed at 3000+ reqs/second, measured locally
   using ab (ApacheBench). This is a crude tool + sufficient, as I issue
   1,000,000 simultaneous 5 thread connections against a server for 30 seconds.

  ab -k -t 30 -n 1000 -c 5 $URL

   This will crash most machines, unless they're tuned well.

2) Memory + CPU. The big killer of performance anywhere is swap thrash. If top
   shows swapping for more than a few seconds, likely your system is heading
   toward a crash.

   Fix: I tend to deploy OVH machines with 128G of memory, as this is enough
   memory to handle huge spikes of memory usage across many sites, during
   traffic spikes... then recover...

   For example, running 100s of sites across many LXD containers, I've had
   machines sustain 250,000+ reqs/hour every day for months.

   At these traffic levels, <1 core used sustained + 50%ish memory use.

   Sites still show 3000+ reqs/sec using ab test above.

3) Database: I run MariaDB rather than MySQL as it's smokin' fast.

   I also relocate /tmp to tmpfs, so temp file i/o runs at memory speed,
   rather than disk speed.

   This ensures all MariaDB temp select set files (for complex selects)
   generate + access at memory speed.

   Also PHP session /tmp files run at memory speed.

   This is important to me, as many of my clients run large membership
   sites. Many are >40K members. This sites performance would circle
   the drain if /tmp was on disk.

4) Disk Thrash: Becomes the killer as traffic increases.

5) Apache Logging: For several clients I'm currently retuning my Apache logging
   to skip logging of successful serves of - images, css, js, fonts. I'll still
   long non-200s, as these need to be debugged.

   This can make a huge difference if memory pressure/use forces disk writes to
   actually go to disk, rather than kernel filesystem i/o buffers.

   Once memory pressure forces physical disk writes, disk i/o starves Apache 
from
   quickly serving uncached content. Very ugly.

   Right now I'm doing extensive filesystem testing, to reduce disk thrash 
during
   traffic spikes + related memory pressure.

6) Net Connection: If you're running 1000s of containers, best also check 
adapter
   saturation. I use 10Gig adapters + even at extreme traffic levels, they 
barely
   reach 10% saturation.

   This means 10Gig adapters are a must for me, as 10% is 1Gig, so using 1Gig
   adapters, site speed would begin to throttle, based on adapter saturation,
   which would be a bear to debug.

7) Apache: I've taken setting up Apache to ki

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-03-27 Thread Serge E. Hallyn
On Tue, Mar 14, 2017 at 02:29:01AM +0100, Benoit GEORGELIN - Association 
Web4all wrote:
> - Mail original -
> > De: "Simos Xenitellis" <simos.li...@googlemail.com>
> > À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> > Envoyé: Lundi 13 Mars 2017 20:22:03
> > Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers
> 
> > On Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN - Association
> > Web4all <benoit.george...@web4all.fr> wrote:
> > > Hi lxc-users ,
> 
> > > I would like to know if you have any experience with a large number of
> > > LXC/LXD containers ?
> > > In term of performance, stability and limitation .
> 
> > > I'm wondering for exemple, if having 100 containers behave the same of
> > > having 1.000 or 10.000 with the same configuration to avoid to talk about
> > > container usage.
> 
> > > I have been looking around for a couple of days to found any user/admin
> > > feedback experience but i'm not able to find large deployments
> 
> > > Is there any ressources limits or any maximum number that can be deployed 
> > > on
> > > the same node ?
> > > Beside physical performance of the node, is there any specific behavior 
> > > that
> > > a large number of LXC/LXD containers can experience ? I'm not aware of any
> > > test or limits that can occurs beside number of process. But I'm sure from
> > > LXC/LXD side it might have some technical contraints ?
> > > Maybe on namespace availability , or any other technical layer used by
> > > LXC/LXD
> 
> > > I will be interested to here from your experience or if you have any
> > > links/books/story about this large deployments
> 
> 
> > This would be interesting to hear if someone can talk publicly about
> > their large deployment.
> 
> > In any case, it should be possible to create, for example, 1000 web servers
> > and then try to access each one and check any issues regarding the
> > response time.
> > Another test would be to install 1000 Wordpress installations and
> > check again for the response time
> > and resource usage.
> > Such scripts to create this massive number of containers would also be
> > helpful to replicate
> > any issues in order to solve them.
> 
> > Simos
> 
> 
> Yes it's would be very nice to hear about this kind of infrastructure using 
> lxc/lxd 
> I'm not yet ready to make this kind of testing, but if someone would like to 
> work on this with me as a projet, I can provide the technical infrastructure 
> and scripts . 
> That would be nice to provide a good testing case and analyse to share to the 
> community 

It should be pretty simple.  I've done testing like this to test other
software, which implicitly ended up testing lxd.

You'd probably create a first container and publish it locally,

(all commands untested, just an example)

lxc launch ubuntu:xenial template
lxc exec template -- apt -y install nginx
lxc stop template
lxc publish --alias template template

Then do your testing in a loop,

# spin up containers
for i in `seq -f "%04g" 1 1000`; do
lxc launch template nginx-$i
done

# spin up clients
... etc
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-03-13 Thread Benoit GEORGELIN - Association Web4all
- Mail original -
> De: "Simos Xenitellis" <simos.li...@googlemail.com>
> À: "lxc-users" <lxc-users@lists.linuxcontainers.org>
> Envoyé: Lundi 13 Mars 2017 20:22:03
> Objet: Re: [lxc-users] Experience with large number of LXC/LXD containers

> On Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN - Association
> Web4all <benoit.george...@web4all.fr> wrote:
> > Hi lxc-users ,

> > I would like to know if you have any experience with a large number of
> > LXC/LXD containers ?
> > In term of performance, stability and limitation .

> > I'm wondering for exemple, if having 100 containers behave the same of
> > having 1.000 or 10.000 with the same configuration to avoid to talk about
> > container usage.

> > I have been looking around for a couple of days to found any user/admin
> > feedback experience but i'm not able to find large deployments

> > Is there any ressources limits or any maximum number that can be deployed on
> > the same node ?
> > Beside physical performance of the node, is there any specific behavior that
> > a large number of LXC/LXD containers can experience ? I'm not aware of any
> > test or limits that can occurs beside number of process. But I'm sure from
> > LXC/LXD side it might have some technical contraints ?
> > Maybe on namespace availability , or any other technical layer used by
> > LXC/LXD

> > I will be interested to here from your experience or if you have any
> > links/books/story about this large deployments


> This would be interesting to hear if someone can talk publicly about
> their large deployment.

> In any case, it should be possible to create, for example, 1000 web servers
> and then try to access each one and check any issues regarding the
> response time.
> Another test would be to install 1000 Wordpress installations and
> check again for the response time
> and resource usage.
> Such scripts to create this massive number of containers would also be
> helpful to replicate
> any issues in order to solve them.

> Simos


Yes it's would be very nice to hear about this kind of infrastructure using 
lxc/lxd 
I'm not yet ready to make this kind of testing, but if someone would like to 
work on this with me as a projet, I can provide the technical infrastructure 
and scripts . 
That would be nice to provide a good testing case and analyse to share to the 
community 

Benoit.
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Experience with large number of LXC/LXD containers

2017-03-13 Thread Simos Xenitellis
On Sun, Mar 12, 2017 at 11:28 PM, Benoit GEORGELIN - Association
Web4all  wrote:
> Hi lxc-users ,
>
> I would like to know if you have any experience with a large number of
> LXC/LXD containers ?
> In term of performance, stability and limitation .
>
> I'm wondering for exemple, if having 100 containers behave the same of
> having 1.000 or 10.000  with the same configuration to avoid to talk about
> container usage.
>
> I have been looking around for a couple of days to found any user/admin
> feedback experience but i'm not able to find large deployments
>
> Is there any ressources limits or any maximum number that can be deployed on
> the same node ?
> Beside physical performance of the node, is there any specific behavior that
> a large number of LXC/LXD containers can experience ? I'm not aware of any
> test or limits that can occurs beside number of process. But I'm sure from
> LXC/LXD side it might have some technical contraints ?
> Maybe on namespace availability , or any other technical layer used by
> LXC/LXD
>
> I will be interested to here from your experience or if you have any
> links/books/story about this large deployments
>

This would be interesting to hear if someone can talk publicly about
their large deployment.

In any case, it should be possible to create, for example, 1000 web servers
and then try to access each one and check any issues regarding the
response time.
Another test would be to install 1000 Wordpress installations and
check again for the response time
and resource usage.
Such scripts to create this massive number of containers would also be
helpful to replicate
any issues in order to solve them.

Simos
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users