Hello,

We've looked more at this system as performance begins to degrade:

DUring an scp of a file to one of the lxc containers, iostat shows "await" 
numbers for individual disks hitting 80, but not for sustained amounts of time. 
Using ps and top to look at cpu, the scp process is using 70% of two CPU cores, 
and %SI in top fluctuates between 13 and 30%. OTher processes begin to use more 
CPU than they normally would, like top, ps, sshd, Etc. For memory, 27G out of 
32G is being used to cache IO, but this seems like a good thing?

If I reboot this box, it will perform better, but it will continue to degrade 
for 7-15 days until %SI CPU is sustained at 40-60%, and performance is slow 
enough that shutting down the lxc containers takes 20 minutes per container.

Has anyone seen anything like this?


Thanks,

Ivan.

On Sep 15, 2011, at 9:42 AM, Iliyan Stoyanov wrote:

> Hi Ivan,
> 
> you should probably do a monitoring with iostat and vmstat also. On the top 
> of my head I can think of at least 3 or 4 reasons why this might be 
> happening. I have similar problems with a simple laptop machine without LXC 
> containers on it (and don't have such on a server with a bunch of containers 
> on it). In my experience with bad SI everything always come back to be RAM 
> related. Also check your filesystem performance. Most of the FSes nowadays 
> keep a ton of the journalling info in RAM. I know my response is not exactly 
> an answer to your specific question but I hope it might give you some 
> pointers for better monitoring of the situation.
> 
> BR,
> 
> --ilf
> 
> On Thu, 2011-09-15 at 09:12 -0600, Ivan Fetch wrote:
>> Hello,
>> 
>> I've inherited a Sun 4540 (thumper) machine running 9 LXC containers. During 
>> the past few weeks we've been troubleshooting a decline in performance, 
>> which ends up in high %SI (software interrupt) CPU usage. I'm hoping someone 
>> here can help troubleshoot and narrow down what the real issue is - this one 
>> really has me stumped.
>> 
>> THis box has 48 disks, 5 RAID6 which are in a RAID0, using md. Two NICs are 
>> bonded together, and a bridge is used for the box's IP, and the LXC network 
>> interfaces.
>> 
>> Linux is Ubuntu 10.04, LXC 0.6.3 , containers are also 10.04. Containers run 
>> Apache, some custom image processing, gaussian, and FTP server...
>> 
>> The box performs well after a reboot, with all containers back online. After 
>> ~5 days, we notice that the box is sluggish, and backup jobs (Netbackup) get 
>> less than 1Mb/sec over the network. CPU eventually reaches 61% SI. OTher 
>> processes (I am looking at ps -ax -o pcpu ..... |sort -n) begin taking much 
>> higher percent CPU than they should need, I imagine because the high %SI is 
>> taking cycles; E.G. I'll briefly see ps or sort or a shell using 6% CPU. Top 
>> shows %sy between 5-20, %wa under 5.
>> Memory (32Gb) is mostly used for cache, and there is no swapping.
>> 
>> I know next-to-nothing about tracking down the cause for high %SI CPU usage.
>> 
>> 
>> Thanks for any help looking at this with a clear head,
>> 
>> - Ivan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> .
>> ------------------------------------------------------------------------------
>> Doing More with Less: The Next Generation Virtual Desktop 
>> What are the key obstacles that have prevented many mid-market businesses
>> from deploying virtual desktops?   How do next-generation virtual desktops
>> provide companies an easier-to-deploy, easier-to-manage and more affordable
>> virtual desktop model.
>> http://www.accelacomm.com/jaw/sfnl/114/51426474/
>> 
>> _______________________________________________
>> Lxc-users mailing list
>> 
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/lxc-users























.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Lxc-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lxc-users

Reply via email to