Re: [Lxc-users] Help troubleshooting declining performance / high %SI CPU when running 9 Ubuntu 10.04 LXCs

Ivan Fetch Tue, 11 Oct 2011 19:04:13 -0700

Hi,

THe LXC containers share /var (13Tb) with the host OS. This is something I want 
to change, but need to wait until the containers are less used.


I will try some load testing without SSH, although even commands like top will 
use 11% of a CPU core. THanks for the suggestion!

- Ivan

On Oct 11, 2011, at 7:21 PM, Derek Simkowiak wrote:

> > Has anyone seen anything like this? 
> 
>     I have not seen the performance degradation you describe, however, SSH / 
> SCP does have very poor network I/O performance, due to bad buffer sizes.  
> There's an I/O patch here:
> 
> http://www.psc.edu/networking/projects/hpn-ssh/
> 
>     They get massive I/O improvements with that patch.  So, it's possible the 
> issue is SSH and nothing to do with LXC.  
> 
>     I suggest using iperf (network I/O tester) to see if you can reproduce 
> the symptoms without SSH.  Also run some big dd commands inside and outside 
> of the LXC container, for comparison.  In short, trash the disk and network 
> without using SSH and see if you can find a reproducible test case.
> 
>     Also, does the LXC container have its own partition?  Or does it share 
> the filesystem with the host O.S.?
> 
> 
> Thanks,
> Derek Simkowiak
> http://derek.simkowiak.net
> 
> 
> P.S.> (At this moment I'm getting a 403 error from the HPN-SSH link... but it 
> was working a few days ago.)
> 
> On 10/11/2011 05:17 PM, Ivan Fetch wrote:
>> Hello,
>> 
>> We've looked more at this system as performance begins to degrade:
>> 
>> DUring an scp of a file to one of the lxc containers, iostat shows "await" 
>> numbers for individual disks hitting 80, but not for sustained amounts of 
>> time. Using ps and top to look at cpu, the scp process is using 70% of two 
>> CPU cores, and %SI in top fluctuates between 13 and 30%. OTher processes 
>> begin to use more CPU than they normally would, like top, ps, sshd, Etc. For 
>> memory, 27G out of 32G is being used to cache IO, but this seems like a good 
>> thing?
>> 
>> If I reboot this box, it will perform better, but it will continue to 
>> degrade for 7-15 days until %SI CPU is sustained at 40-60%, and performance 
>> is slow enough that shutting down the lxc containers takes 20 minutes per 
>> container.
>> 
>> Has anyone seen anything like this?
>> 
>> 
>> Thanks,
>> 
>> Ivan.
>> 
>> On Sep 15, 2011, at 9:42 AM, Iliyan Stoyanov wrote:
>> 
>> 
>>> Hi Ivan,
>>> 
>>> you should probably do a monitoring with iostat and vmstat also. On the top 
>>> of my head I can think of at least 3 or 4 reasons why this might be 
>>> happening. I have similar problems with a simple laptop machine without LXC 
>>> containers on it (and don't have such on a server with a bunch of 
>>> containers on it). In my experience with bad SI everything always come back 
>>> to be RAM related. Also check your filesystem performance. Most of the FSes 
>>> nowadays keep a ton of the journalling info in RAM. I know my response is 
>>> not exactly an answer to your specific question but I hope it might give 
>>> you some pointers for better monitoring of the situation.
>>> 
>>> BR,
>>> 
>>> --ilf
>>> 
>>> On Thu, 2011-09-15 at 09:12 -0600, Ivan Fetch wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I've inherited a Sun 4540 (thumper) machine running 9 LXC containers. 
>>>> During the past few weeks we've been troubleshooting a decline in 
>>>> performance, which ends up in high %SI (software interrupt) CPU usage. I'm 
>>>> hoping someone here can help troubleshoot and narrow down what the real 
>>>> issue is - this one really has me stumped.
>>>> 
>>>> THis box has 48 disks, 5 RAID6 which are in a RAID0, using md. Two NICs 
>>>> are bonded together, and a bridge is used for the box's IP, and the LXC 
>>>> network interfaces.
>>>> 
>>>> Linux is Ubuntu 10.04, LXC 0.6.3 , containers are also 10.04. Containers 
>>>> run Apache, some custom image processing, gaussian, and FTP server...
>>>> 
>>>> The box performs well after a reboot, with all containers back online. 
>>>> After ~5 days, we notice that the box is sluggish, and backup jobs 
>>>> (Netbackup) get less than 1Mb/sec over the network. CPU eventually reaches 
>>>> 61% SI. OTher processes (I am looking at ps -ax -o pcpu ..... |sort -n) 
>>>> begin taking much higher percent CPU than they should need, I imagine 
>>>> because the high %SI is taking cycles; E.G. I'll briefly see ps or sort or 
>>>> a shell using 6% CPU. Top shows %sy between 5-20, %wa under 5.
>>>> Memory (32Gb) is mostly used for cache, and there is no swapping.
>>>> 
>>>> I know next-to-nothing about tracking down the cause for high %SI CPU 
>>>> usage.
>>>> 
>>>> 
>>>> Thanks for any help looking at this with a clear head,
>>>> 
>>>> - Ivan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> .
>>>> ------------------------------------------------------------------------------
>>>> Doing More with Less: The Next Generation Virtual Desktop 
>>>> What are the key obstacles that have prevented many mid-market businesses
>>>> from deploying virtual desktops?   How do next-generation virtual desktops
>>>> provide companies an easier-to-deploy, easier-to-manage and more affordable
>>>> virtual desktop model.
>>>> 
>>>> http://www.accelacomm.com/jaw/sfnl/114/51426474/
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Lxc-users mailing list
>>>> 
>>>> 
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/lxc-users
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> .
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> 
>> http://p.sf.net/sfu/splunk-d2d-oct
>> 
>> _______________________________________________
>> Lxc-users mailing list
>> 
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/lxc-users
> 
> <ATT00001..txt><ATT00002..txt>























.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Lxc-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lxc-users

Re: [Lxc-users] Help troubleshooting declining performance / high %SI CPU when running 9 Ubuntu 10.04 LXCs

Reply via email to