Hi, THe LXC containers share /var (13Tb) with the host OS. This is something I want to change, but need to wait until the containers are less used.
I will try some load testing without SSH, although even commands like top will use 11% of a CPU core. THanks for the suggestion! - Ivan On Oct 11, 2011, at 7:21 PM, Derek Simkowiak wrote: > > Has anyone seen anything like this? > > I have not seen the performance degradation you describe, however, SSH / > SCP does have very poor network I/O performance, due to bad buffer sizes. > There's an I/O patch here: > > http://www.psc.edu/networking/projects/hpn-ssh/ > > They get massive I/O improvements with that patch. So, it's possible the > issue is SSH and nothing to do with LXC. > > I suggest using iperf (network I/O tester) to see if you can reproduce > the symptoms without SSH. Also run some big dd commands inside and outside > of the LXC container, for comparison. In short, trash the disk and network > without using SSH and see if you can find a reproducible test case. > > Also, does the LXC container have its own partition? Or does it share > the filesystem with the host O.S.? > > > Thanks, > Derek Simkowiak > http://derek.simkowiak.net > > > P.S.> (At this moment I'm getting a 403 error from the HPN-SSH link... but it > was working a few days ago.) > > On 10/11/2011 05:17 PM, Ivan Fetch wrote: >> Hello, >> >> We've looked more at this system as performance begins to degrade: >> >> DUring an scp of a file to one of the lxc containers, iostat shows "await" >> numbers for individual disks hitting 80, but not for sustained amounts of >> time. Using ps and top to look at cpu, the scp process is using 70% of two >> CPU cores, and %SI in top fluctuates between 13 and 30%. OTher processes >> begin to use more CPU than they normally would, like top, ps, sshd, Etc. For >> memory, 27G out of 32G is being used to cache IO, but this seems like a good >> thing? >> >> If I reboot this box, it will perform better, but it will continue to >> degrade for 7-15 days until %SI CPU is sustained at 40-60%, and performance >> is slow enough that shutting down the lxc containers takes 20 minutes per >> container. >> >> Has anyone seen anything like this? >> >> >> Thanks, >> >> Ivan. >> >> On Sep 15, 2011, at 9:42 AM, Iliyan Stoyanov wrote: >> >> >>> Hi Ivan, >>> >>> you should probably do a monitoring with iostat and vmstat also. On the top >>> of my head I can think of at least 3 or 4 reasons why this might be >>> happening. I have similar problems with a simple laptop machine without LXC >>> containers on it (and don't have such on a server with a bunch of >>> containers on it). In my experience with bad SI everything always come back >>> to be RAM related. Also check your filesystem performance. Most of the FSes >>> nowadays keep a ton of the journalling info in RAM. I know my response is >>> not exactly an answer to your specific question but I hope it might give >>> you some pointers for better monitoring of the situation. >>> >>> BR, >>> >>> --ilf >>> >>> On Thu, 2011-09-15 at 09:12 -0600, Ivan Fetch wrote: >>> >>>> Hello, >>>> >>>> I've inherited a Sun 4540 (thumper) machine running 9 LXC containers. >>>> During the past few weeks we've been troubleshooting a decline in >>>> performance, which ends up in high %SI (software interrupt) CPU usage. I'm >>>> hoping someone here can help troubleshoot and narrow down what the real >>>> issue is - this one really has me stumped. >>>> >>>> THis box has 48 disks, 5 RAID6 which are in a RAID0, using md. Two NICs >>>> are bonded together, and a bridge is used for the box's IP, and the LXC >>>> network interfaces. >>>> >>>> Linux is Ubuntu 10.04, LXC 0.6.3 , containers are also 10.04. Containers >>>> run Apache, some custom image processing, gaussian, and FTP server... >>>> >>>> The box performs well after a reboot, with all containers back online. >>>> After ~5 days, we notice that the box is sluggish, and backup jobs >>>> (Netbackup) get less than 1Mb/sec over the network. CPU eventually reaches >>>> 61% SI. OTher processes (I am looking at ps -ax -o pcpu ..... |sort -n) >>>> begin taking much higher percent CPU than they should need, I imagine >>>> because the high %SI is taking cycles; E.G. I'll briefly see ps or sort or >>>> a shell using 6% CPU. Top shows %sy between 5-20, %wa under 5. >>>> Memory (32Gb) is mostly used for cache, and there is no swapping. >>>> >>>> I know next-to-nothing about tracking down the cause for high %SI CPU >>>> usage. >>>> >>>> >>>> Thanks for any help looking at this with a clear head, >>>> >>>> - Ivan >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> . >>>> ------------------------------------------------------------------------------ >>>> Doing More with Less: The Next Generation Virtual Desktop >>>> What are the key obstacles that have prevented many mid-market businesses >>>> from deploying virtual desktops? How do next-generation virtual desktops >>>> provide companies an easier-to-deploy, easier-to-manage and more affordable >>>> virtual desktop model. >>>> >>>> http://www.accelacomm.com/jaw/sfnl/114/51426474/ >>>> >>>> >>>> _______________________________________________ >>>> Lxc-users mailing list >>>> >>>> >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/lxc-users >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> . >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> >> http://p.sf.net/sfu/splunk-d2d-oct >> >> _______________________________________________ >> Lxc-users mailing list >> >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/lxc-users > > <ATT00001..txt><ATT00002..txt> . ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Lxc-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/lxc-users
