Re: [discuss] deadlocks

Jim Klimov Wed, 25 Nov 2015 15:57:07 -0800

25 ноября 2015 г. 23:32:13 CET, Marion Hakanson <[email protected]> пишет:
>Hi Gabriele,
>
>The "prstat -Z" numbers by themselves do not tell you if paging is
>going on.
>All you can really glean from it is that all values in the "MEMORY"
>percent
>column add up to less than 50% of the overall system memory.
>
>Look at "vmstat 1" output, the "pi" and "po" columns in particular,
>while it prints out one line per second over time.  You should be
>able to do this from within each zone as well as in the global zone.
>That will tell you if active paging/swapping is going on.
>
>You could also use "iostat -xn 1" and see if the drives used by your
>swap space are busy, to look for clues.
>
>Regards,
>
>Marion
>
>
>> Date: Wed, 25 Nov 2015 23:00:55 +0100
>> From: Gabriele Bulfon <[email protected]>
>> Reply-To: <[email protected]>
>> To: Marion Hakanson <[email protected]>, <[email protected]>
>> Subject: Re: [discuss] deadlocks
>> 
>> Thanks Marion!
>> this is what I was thinking, look at my global zone prstat -Z :
>> ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
>>      7      855 8697M 6877M    28%  16:03:15 2.6% cloudserver
>>      8      292 2978M 1766M   7.2%   0:20:09 0.4% www.sonicle.com
>>              92 1129M  477M   1.9% 110:18:26 0.1% global
>>      1       23   41M   31M   0.1%  11:55:15 0.0% asterisk
>>      5       26  524M  460M   1.9%  11:23:06 0.0% pkgserver
>>      3      434 2212M 1231M   5.0%  39:04:23 0.0% encoserver
>>      2      138 1402M  774M   3.2%   4:04:16 0.0% demo.sonicle.com
>> 
>> cloudserver and www.sonicle.com are the said zones.
>> I'm not sure what the SWAP column is saying, is cloudserver using all
>that swap?
>> There's no clue about swaps inside the zones, swap -l says:
>> sonicle@cloudserver:~$ swap -l
>> swapfile             dev    swaplo   blocks     free
>> /dev/swap             -          8 25149432 24050136
>> but maybe it's not real...
>>
>----------------------------------------------------------------------------------
>> Da:  Marion Hakanson
>>  [email protected]
>> Data: 25 novembre 2015 20.06.54 CET
>> Oggetto: Re: [discuss] deadlocks
>> Hi Gabriele,
>> The behavior you describe could be caused by saturation of any of
>> the resources on the system, not only CPU/load.  If all of RAM were
>> used up, for example, a new SSH session would pause until processes
>> were swapped/paged out enough to give room for a new SSH to run
>> in memory.  Similar results could happen if network or disk resources
>> were saturated.
>> Have a look at the USE method:
>> http://www.brendangregg.com/usemethod.html
>> On illumos/Solaris-based systems, you can use these commands to start
>with:
>> prstat 1  (for CPU)
>> vmstat 1  (for memory)
>> iostat -xn 1 (for disk)
>> dladm show-link -s -i 1
>> (for network)
>> Also check /var/adm/messages for errors being logged at or near the
>> time of the issue.
>> Regards,
>> Marion
>> Date: Wed, 25 Nov 2015 19:39:38 +0100
>> From: Gabriele Bulfon
>> To:
>> Subject: [discuss] deadlocks
>> Hi,
>> I'm looking for help to find a solution to strange slow downs on a
>long living XStream/illumos server.
>> This server runs 5-6 zones, on intel 8 cores, 24GB ram, separate boot
>on sata mirror rpool, and data on sas raidz pool.
>> Two of these zones run essentially the same software: apache, tomcat,
>cyrus, postfix, amavis, postgres
>> Apache front ends http to tomcat, running our collab webapps, working
>all the day on postfix smtp, cyrus imap and postgres db.
>> 1st zone is our own dev machine, running 4-5 users actually on all
>the stack.
>> 2nd zone is our customers machine, running around 1000 users on all
>the stack, separated into about 10 cyrus domains
>> and their separated 10 instances of both webapps and databases.
>> Recently, it happens from time to time (1-2 times a week) that
>everything starts to slow down.
>> Stopping one or the other zone's tomcat/apache gets everything back:
>somtimes it's ours, sometimes it's the cloud.
>> Ok, at first sight one would say: your web app has problems.
>> But....then why do I have hard times connecting via ssh to the zones
>during this situations? Login takes minutes,
>> password to shell another lots of minutes, but prstat/top don't show
>any cpu high usage on global zone, nor inside the zones.
>> Then I stop one tomcat (sometimes one, sometimes the other), and
>verything gets free.
>> Imap processes during these times are around 1000 in one machine,
>around 100 on the other.
>> Then they abruptly gets down, obvioiusly the web app closes
>connections.
>> So my question is....how can I dig this problem?
>> I would think that if the webapp is the problem, iniside java/tomcat,
>I should not experience problem during ssh.
>> Any possible limits on socket? Any other idea?
>> Thanks....
>> Gabriele
>> 
> 
>


Also do not forget that in Solaris, swap does not necessarily mean allocated 
memory - it can quite be just reserved. VMs tend to do that - requiring enough 
swap to be available if it needs to be used for a VM's RAM to get swapped out. 
Usually it is never really used.

Try to leave a terminal (maybe in a vnc console) with 'vmstat 1' and 'iostat 
-Xnz 1' so you'd see if the system is swapping durkng the slowdown.

Also maybe walking the fragmented memory is a time-consuming task. Revise your 
JVM settings (GC etc.) to rule that out - or maybe even solve the issue by 
cleaning up often enough that single ops are not fatally slow.

HTH, Jim
--
Typos courtesy of K-9 Mail on my Samsung Android


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] deadlocks

Reply via email to