We have had a recurring problem over the past few months where every once in a 
while (1-5 times a month), our production app server (running JBoss) will 
become extremely slow, with vmstat reporting several hundred thousand context 
switches per second and maxed out cpus (dual processor machine) split about 
25/75 between user/system processes. 

When it first starts happening, JBoss still seems to respond to most requests 
(albeit very slowly), but others just timeout, and eventually it reaches a 
point where nearly all requests timeout. At that point, it is impossible to do 
much more than to log in to the server and kill -9 the jboss process, and even 
that takes about 5 minutes to accomplish. As soon as the jboss process is 
killed, the entire system goes back to normal, and we can restart jboss and 
live happily again until it happens again after about a week of uptime.

Here is what things look like right before restarting jboss:

procs                      memory      swap          io     system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
12  0 100952  16412 282520 1244400    0    0     0     0  106 306339 17 83  0  0
13  0 100952  16408 282520 1244400    0    0     0    24  117 297674 23 77  0  0
15  0 100952  16408 282520 1244400    0    0     0     0  108 336135 17 83  0  0
16  0 100952  16408 282520 1244400    0    0     0     0  108 159630 20 80  0  0
15  0 100952  16408 282520 1244400    0    0     0     0  116 176452 24 76  0  0
14  0 100952  16408 282520 1244400    0    0     0     0  116 99453 27 73  0  0
15  0 100952  16416 282520 1244400    0    0     0    24  117 96588 27 73  0  0

And here is what it looks like right after killing jboss:

 1  0  91632 1409604 282520 1244436    0    0     0     0  171   130  0  0 100  0
 1  0  91632 1409604 282520 1244436    0    0     0   176  178    68  0  1 99  1
 1  0  91632 1409604 282520 1244436    0    0     0     0  125    38  0  0 100  0

And here is what it looks like right when jboss has started back up again:

 5  0  91632 1139332 286124 1321444    0    0     0     0  329   636 76  2 22  0
 7  0  91632 1133860 286124 1321448    0    0     0     0  447   839 98  1  1  0
 4  0  91632 1133520 286124 1321468    0    0     0     0 1284  2436 96  3  1  0
 3  0  91632 1131932 286124 1321468    0    0     0   292 2388  4477 86  4 10  0
 6  0  91632 1131932 286160 1321468    0    0    20     0  814  1428 94  1  4  1
 2  0  91632 1121720 286160 1321468    0    0     0     0  272   504 97  1  2  0

We are running JBoss 4.0.2, but this was happening on 3.2.3 also (we upgraded 
hoping this would go away, and it hasn't). The OS is "Red Hat Enterprise Linux 
ES release 3 (Taroon Update 5)", and the JVM is build 1.4.2_08-b03.

One possibility that I have been suspicious of is our use of temporary files 
that are marked "deleteOnExit". We use them a lot, and store them all in one 
directory. As time goes on, their number only increases. The last time this 
happened, there were about 64,000 temporary files in the directory.

View the original post : 
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=3888668#3888668

Reply to the post : 
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=3888668


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
JBoss-user mailing list
JBoss-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jboss-user

Reply via email to