We've noticed something pretty bothersome for our environment when we went from SLES 11SP2 to 11SP3. The penalty for using more virtual memory on a machine than you have "real" memory allocated to it has gone up dramatically. Have any of you seen anything like this?
This has been seen with all three SP3 kernels: 3.0.82-0.7.9, 3.0.93-0.8.2, and 3.0.101-0.8.1. Kswap0 seems to start frantically going through the virtual memory space looking for something it can free or swap out; the CPU use is very high and the machine is close to non-responsive. A test case we came up with was a simple Perl script that allocated 3.5 GB of memory as one big array and stepped through it. That took two hours on a server with 1 GB of memory total but a minute on a server with 3 GB free. Our first engineer said that this is normal behavior because you shouldn't dip deeply into swap and expect the system to perform decently That argument works in the Intel world because swap goes to disk and disks are very slow. It's not nearly as true on mainframes because swap (at least our swap) goes to extended storage and that's still memory. Since then Suse have come around to our view that we shouldn't be seeing this. Besides, we got away with it in SP2. Something changed with SP3. So: questions. * Has anyone else seen this? * Does anyone know what changed in the kernel between SP2 and SP3? SP2 kernels of similar release number to SP3 don't show this. * Can we tune something to alleviate this? For those who are interested, more detail follows: The real-world applications that trigger this are Java applications that use a lot of memory. Some are Websphere and some are home-grown. I'm convinced that it's the memory used, not the details of the application that's the problem. Once kswapd finishes looking around it really doesn't take that long to go through the array. Once the system gives up cleaning the cupboards and actually starts going through them it's not too bad; it's slower than with adequate memory but by a factor closer to 4 than to 60. The CPU use is also what convinces me it's a kernel problem instead of understandably poor hardware performance. We could get our test systems to go back to the old behavior by downgrading the kernel to SP2---even if the SP2 kernel had a higher version than the SP3 one. For example, we can run the 3.0.93-0.5.1 from SP2 successfully on an otherwise-SP3 system; the 3.0.93-0.8.2 kernel from SP3 has the problem on an otherwise SP2 system. We asked about this here earlier; that thread starts at http://www.mail-archive.com/linux-390@vm.marist.edu/msg64647.html And if you got this far: thank you! Ted Rodriguez-Bell Mainframe and Midrange Services, Wells Fargo te...@wellsfargo.com Company policy requires: This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/