Re: [discuss] deadlocks

Gabriele Bulfon Thu, 26 Nov 2015 00:05:12 -0800

 Is this a per-zone command, or output is the same on any zone?
 This is what I get in the smaller www zone:
 kthr      memory            page            disk          faults      cpu r b 
w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy id 1 0 0 
6531500 798908 1504 6818 0 0 0 0  0  0  0  0 181 4646 465913 68813 13 6 81 0 0 
0 6531412 796496 785 1778 0 0 0  0  0  0 24  0  9 6061 309567 21991 9 2 89 2 0 
0 6531372 792056 639 5799 0 0 0  0  0  0  0  0 11 10890 371104 120952 15 6 79 1 
0 0 6534576 824876 407 4644 0 0 0  0  0  0  0  0 77 5366 193555 10202 12 2 86 1 
0 0 6555168 838108 151 1071 0 0 0  0  0  0  0  0  2 4107 8759 3522  4  1 95 0 0 
0 6562804 841320 160 2223 0 0 0  0  0  0  0  0 405 6257 12961 9741 6 3 90 2 0 0 
6570848 841524 1005 2068 0 0 0 0  0  0 17  0  9 4610 43174 6909 8  2 90 0 0 0 
6578884 843892 720 1977 0 0 0  0  0  0  0  0  1 6613 35529 14808 3 1 96 0 0 0 
6578876 842680 926 1558 0 0 0  0  0  0  0  0 21 7391 63111 19592 5 2 93 0 0 0 
6575672 836400 1376 5808 0 0 0 0  0  0  0  0  4 6102 91652 20029 8 2 89 0 0 0 
6574764 834404 1098 2068 0 0 0 0  0  0  0  0 193 9347 162067 39057 10 4 86 1 0 
0 6572020 826300 1953 7228 0 0 0 0  0  0 14  0 31 10959 156369 49396 11 6 84 0 
0 0 6571000 813060 1342 5191 0 0 0 0  0  0  0  0  2 5087 391732 59501 9 3 88 0 
0 0 6563644 802900 1176 3594 0 0 0 0  0  0  0  0  3 4183 538480 16301 12 2 86 3 
0 0 6561048 795112 1787 2832 0 0 0 0  0  0  0  0  1 8068 493355 66482 14 4 83 1 
0 0 6563072 793696 1239 1712 0 0 0 0  0  0  0  0 265 7595 229455 16196 8 3 89 0 
0 0 6553832 780652 1471 4747 0 0 0 0  0  0 15  0  2 4445 216211 5103 9 1 90 1 0 
0 6545468 768120 2343 6878 0 0 0 0  0  0  0  0 22 5014 159315 13014 17 2 80
 and iostat in the other cloud zone:
 extended device statistics    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t 
 %w  %b device   22.0   27.0  337.5  931.1  0.0  0.2    0.0    5.1   0  21 sd3  
 16.0   27.0  241.0  930.1  0.0  0.2    0.0    5.0   0  17 sd4   17.0   27.0  
338.5  931.1  0.0  0.2    0.0    5.2   0  18 sd5   15.0   26.0  212.0  921.6  
0.0  0.2    0.0    4.4   0  15 sd6   17.0   27.0  216.5  934.6  0.0  0.2    0.0 
   4.1   0  14 sd7
 Sonicle S.r.l.
 Gabriele Bulfon
 Tel +39 028246016 - Fax +39 028243880
 via Fra Cristoforo, 14/D - 20142 - Milano - Italy
 http://www.sonicle.com
 
----------------------------------------------------------------------------------
 Da:  Jim Klimov
  [email protected] Marion Hakanson
 [email protected]
 Data: 26 novembre 2015 0.56.23 CET
 Oggetto: Re: [discuss] deadlocks
 25 ?????? 2015 ?. 23:32:13 CET, Marion Hakanson
 ?????:
 Hi Gabriele,
 The "prstat -Z" numbers by themselves do not tell you if paging is
 going on.
 All you can really glean from it is that all values in the "MEMORY"
 percent
 column add up to less than 50% of the overall system memory.
 Look at "vmstat 1" output, the "pi" and "po" columns in particular,
 while it prints out one line per second over time.  You should be
 able to do this from within each zone as well as in the global zone.
 That will tell you if active paging/swapping is going on.
 You could also use "iostat -xn 1" and see if the drives used by your
 swap space are busy, to look for clues.
 Regards,
 Marion
 Date: Wed, 25 Nov 2015 23:00:55 +0100
 From: Gabriele Bulfon
 Reply-To:
 To: Marion Hakanson
 ,
 Subject: Re: [discuss] deadlocks
 Thanks Marion!
 this is what I was thinking, look at my global zone prstat -Z :
 ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
 7      855 8697M 6877M    28%  16:03:15 2.6% cloudserver
 8      292 2978M 1766M   7.2%   0:20:09 0.4% www.sonicle.com
 92 1129M  477M   1.9% 110:18:26 0.1% global
 1       23   41M   31M   0.1%  11:55:15 0.0% asterisk
 5       26  524M  460M   1.9%  11:23:06 0.0% pkgserver
 3      434 2212M 1231M   5.0%  39:04:23 0.0% encoserver
 2      138 1402M  774M   3.2%   4:04:16 0.0% demo.sonicle.com
 cloudserver and www.sonicle.com are the said zones.
 I'm not sure what the SWAP column is saying, is cloudserver using all
 that swap?
 There's no clue about swaps inside the zones, swap -l says:
 sonicle@cloudserver:~$ swap -l
 swapfile             dev    swaplo   blocks     free
 /dev/swap             -          8 25149432 24050136
 but maybe it's not real...
 
----------------------------------------------------------------------------------
 Da:   Marion Hakanson
 [email protected]
 Data: 25 novembre 2015 20.06.54 CET
 Oggetto: Re: [discuss] deadlocks
 Hi Gabriele,
 The behavior you describe could be caused by saturation of any of
 the resources on the system, not only CPU/load.  If all of RAM were
 used up, for example, a new SSH session would pause until processes
 were swapped/paged out enough to give room for a new SSH to run
 in memory.  Similar results could happen if network or disk resources
 were saturated.
 Have a look at the USE method:
 http://www.brendangregg.com/usemethod.html
 On illumos/Solaris-based systems, you can use these commands to start
 with:
 prstat 1  (for CPU)
 vmstat 1  (for memory)
 iostat -xn 1 (for disk)
 dladm show-link -s -i 1
 (for network)
 Also check /var/adm/messages for errors being logged at or near the
 time of the issue.
 Regards,
 Marion
 Date: Wed, 25 Nov 2015 19:39:38 +0100
 From: Gabriele Bulfon
 To:
 Subject: [discuss] deadlocks
 Hi,
 I'm looking for help to find a solution to strange slow downs on a
 long living XStream/illumos server.
 This server runs 5-6 zones, on intel 8 cores, 24GB ram, separate boot
 on sata mirror rpool, and data on sas raidz pool.
 Two of these zones run essentially the same software: apache, tomcat,
 cyrus, postfix, amavis, postgres
 Apache front ends http to tomcat, running our collab webapps, working
 all the day on postfix smtp, cyrus imap and postgres db.
 1st zone is our own dev machine, running 4-5 users actually on all
 the stack.
 2nd zone is our customers machine, running around 1000 users on all
 the stack, separated into about 10 cyrus domains
 and their separated 10 instances of both webapps and databases.
 Recently, it happens from time to time (1-2 times a week) that
 everything starts to slow down.
 Stopping one or the other zone's tomcat/apache gets everything back:
 somtimes it's ours, sometimes it's the cloud.
 Ok, at first sight one would say: your web app has problems.
 But....then why do I have hard times connecting via ssh to the zones
 during this situations? Login takes minutes,
 password to shell another lots of minutes, but prstat/top don't show
 any cpu high usage on global zone, nor inside the zones.
 Then I stop one tomcat (sometimes one, sometimes the other), and
 verything gets free.
 Imap processes during these times are around 1000 in one machine,
 around 100 on the other.
 Then they abruptly gets down, obvioiusly the web app closes
 connections.
 So my question is....how can I dig this problem?
 I would think that if the webapp is the problem, iniside java/tomcat,
 I should not experience problem during ssh.
 Any possible limits on socket? Any other idea?
 Thanks....
 Gabriele
 
 Also do not forget that in Solaris, swap does not necessarily mean allocated 
memory - it can quite be just reserved. VMs tend to do that - requiring enough 
swap to be available if it needs to be used for a VM's RAM to get swapped out. 
Usually it is never really used.
 Try to leave a terminal (maybe in a vnc console) with 'vmstat 1' and 'iostat 
-Xnz 1' so you'd see if the system is swapping durkng the slowdown.
 Also maybe walking the fragmented memory is a time-consuming task. Revise your 
JVM settings (GC etc.) to rule that out - or maybe even solve the issue by 
cleaning up often enough that single ops are not fatally slow.
 HTH, Jim
 --
 Typos courtesy of K-9 Mail on my Samsung Android




-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] deadlocks

Reply via email to