> since we're running the CPU at 100%, we don't think the problem is all > page waits, although there have been some kernel messages from the > Websphere machine in particular that memory allocations for Java > processes (tasks?) are failing.
Check the max heap size for the JVM. If you're running that close to the edge, WAS may be trying to do some smart avoidance of paging by playing fast and loose with the JVM heap to avoid asking for more resources. If it finally overruns that allocation, WAS will get very unhappy and start behaving as you describe. > Most disturbing have been a couple of episodes in which the Websphere > guest just simply stops responding to terminal inputs. Our terminal > sessions are via ssh, but these sessions seem to just hang. New > connections time out, and logged in users can't kill running tasks and > so forth. As I say, the Linux images are getting (taking) a > fair amount > of the CPU, so it's not that the whole Linux images are just dormant. I've seen this happen in situations where the kernel gets hammered by very high volumes of memory allocation requests and goes off into la-la land during a swap operation. Also, what filesystem are you using? We discovered a rather nasty bug related to journaling in ReiserFS at very high traffic volumes a few days ago that was causing a lot of kernel overhead and occasionally sending things out into the void. What do your swap rate and vmstat output look like? -- db ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
