Re: OOM Condition on SLES11 running WAS - Tuning problems? - Possible Solution
We disabled our LDAP SSO after i noticed on an strace -f -p [pid] of the nodeagent that all spawned threads were making LDAP Calls to the internal authentication server. This opens up other questions, like why would LDAP cause a memory leak in the raw (no applications) Websphere installation.. but at least the memory leak seems to have been solved. On Wed, Aug 11, 2010 at 2:01 PM, Daniel Tate wrote: > Has anyone gotten this to successfully work on SLES11? Other versions > of SLES? Redhat? > > Here is a description from the websphere admin of the problem from his side: > > > > We are running 6.1.0.25 ND on Zlinux (SLES 11). We have 2 nodes created > > on a server with 14 application servers per node. All jvm sizes are set > > to default 50/256. No applications are installed. > > The server has 10GB of real memory and 30GB of swap memory. We start > > the node agents up, then start all applications servers. Once > > everything is up, approximately 7GB of memory is used. > > With the server sitting idle, the memory used by the node agents will > > continue to grow to well over 1GB. Memory will be continuously used > > until all real memory and swap memory is consumed. At this point the > > server becomes unavailable and eventually kills off all java processes > > on the machine. No heap dumps have ever been generated. > > > On Wed, Aug 11, 2010 at 1:27 PM, Daniel Tate wrote: >> It's the nodeagents that are consuming the memory. The weird thing is >> RSS and VIRT are both larger than what WAS shows in the console.. and >> VIRT grows even though no swapping was active. >> >> We simply cannot explain the action.. We have PMR's open for every >> possible avenue here. So i'll ask the question another way: >> >> Does anyone have WebSphere (Network Deployment edition) running >> successfully on s390x, SLES11 GA? >> >> >> On Wed, Jul 28, 2010 at 9:05 AM, Agblad Tore wrote: >>> and to make it somewhat easier to login :) >>> start just a few servers at boot, then start one at a time until >>> you see problems. >>> Then it's time to stop that last one, and either increase memory >>> or move the not started servers into one or more other Linux machines. >>> >>> >>> ___ >>> Tore Agblad >>> Volvo Information Technology >>> Infrastructure Mainframe Design & Development, Linux servers >>> Dept 4352 DA1S >>> SE-405 08, Gothenburg Sweden >>> >>> Telephone: +46-31-3233569 >>> E-mail: tore.agb...@volvo.com >>> >>> http://www.volvo.com/volvoit/global/en-gb/ >>> >>> -Original Message- >>> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane >>> G >>> Sent: den 28 juli 2010 02:51 >>> To: LINUX-390@VM.MARIST.EDU >>> Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? >>> >>> Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone. >>> Smells like it's busy doing its own stuff. If it were me I'd want to know >>> trends for things like >>> swap-in and swap-out rates, tasks in uninterruptible sleep, context switch >>> counts. >>> >>> SAR is too granular to be any use even if it did have the data. Set up a >>> background script to >>> run top and vmstat and write to disk every so often. A quick bit of awk >>> should show the >>> trend. You could do all the probing of /proc yourself, but I find it easier >>> to allow things like >>> top/ps/vmstat do all the grunt work. >>> >>> Shane ... >>> >>> -- >>> For LINUX-390 subscribe / signoff / archive access instructions, >>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or >>> visit >>> http://www.marist.edu/htbin/wlvindex?LINUX-390 >>> -- >>> For more information on Linux on System z, visit >>> http://wiki.linuxvm.org/ >>> >>> -- >>> For LINUX-390 subscribe / signoff / archive access instructions, >>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or >>> visit >>> http://www.marist.edu/htbin/wlvindex?LINUX-390 >>> -- >>> For more information on Linux on System z, visit >>> http://wiki.linuxvm.org/ >>> >> > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Has anyone gotten this to successfully work on SLES11? Other versions of SLES? Redhat? Here is a description from the websphere admin of the problem from his side: We are running 6.1.0.25 ND on Zlinux (SLES 11). We have 2 nodes created on a server with 14 application servers per node. All jvm sizes are set to default 50/256. No applications are installed. The server has 10GB of real memory and 30GB of swap memory. We start the node agents up, then start all applications servers. Once everything is up, approximately 7GB of memory is used. With the server sitting idle, the memory used by the node agents will continue to grow to well over 1GB. Memory will be continuously used until all real memory and swap memory is consumed. At this point the server becomes unavailable and eventually kills off all java processes on the machine. No heap dumps have ever been generated. On Wed, Aug 11, 2010 at 1:27 PM, Daniel Tate wrote: > It's the nodeagents that are consuming the memory. The weird thing is > RSS and VIRT are both larger than what WAS shows in the console.. and > VIRT grows even though no swapping was active. > > We simply cannot explain the action.. We have PMR's open for every > possible avenue here. So i'll ask the question another way: > > Does anyone have WebSphere (Network Deployment edition) running > successfully on s390x, SLES11 GA? > > > On Wed, Jul 28, 2010 at 9:05 AM, Agblad Tore wrote: >> and to make it somewhat easier to login :) >> start just a few servers at boot, then start one at a time until >> you see problems. >> Then it's time to stop that last one, and either increase memory >> or move the not started servers into one or more other Linux machines. >> >> >> ___ >> Tore Agblad >> Volvo Information Technology >> Infrastructure Mainframe Design & Development, Linux servers >> Dept 4352 DA1S >> SE-405 08, Gothenburg Sweden >> >> Telephone: +46-31-3233569 >> E-mail: tore.agb...@volvo.com >> >> http://www.volvo.com/volvoit/global/en-gb/ >> >> -Original Message----- >> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane G >> Sent: den 28 juli 2010 02:51 >> To: LINUX-390@VM.MARIST.EDU >> Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? >> >> Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone. >> Smells like it's busy doing its own stuff. If it were me I'd want to know >> trends for things like >> swap-in and swap-out rates, tasks in uninterruptible sleep, context switch >> counts. >> >> SAR is too granular to be any use even if it did have the data. Set up a >> background script to >> run top and vmstat and write to disk every so often. A quick bit of awk >> should show the >> trend. You could do all the probing of /proc yourself, but I find it easier >> to allow things like >> top/ps/vmstat do all the grunt work. >> >> Shane ... >> >> -- >> For LINUX-390 subscribe / signoff / archive access instructions, >> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or >> visit >> http://www.marist.edu/htbin/wlvindex?LINUX-390 >> -- >> For more information on Linux on System z, visit >> http://wiki.linuxvm.org/ >> >> -- >> For LINUX-390 subscribe / signoff / archive access instructions, >> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or >> visit >> http://www.marist.edu/htbin/wlvindex?LINUX-390 >> -- >> For more information on Linux on System z, visit >> http://wiki.linuxvm.org/ >> > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
It's the nodeagents that are consuming the memory. The weird thing is RSS and VIRT are both larger than what WAS shows in the console.. and VIRT grows even though no swapping was active. We simply cannot explain the action.. We have PMR's open for every possible avenue here.So i'll ask the question another way: Does anyone have WebSphere (Network Deployment edition) running successfully on s390x, SLES11 GA? On Wed, Jul 28, 2010 at 9:05 AM, Agblad Tore wrote: > and to make it somewhat easier to login :) > start just a few servers at boot, then start one at a time until > you see problems. > Then it's time to stop that last one, and either increase memory > or move the not started servers into one or more other Linux machines. > > > ___ > Tore Agblad > Volvo Information Technology > Infrastructure Mainframe Design & Development, Linux servers > Dept 4352 DA1S > SE-405 08, Gothenburg Sweden > > Telephone: +46-31-3233569 > E-mail: tore.agb...@volvo.com > > http://www.volvo.com/volvoit/global/en-gb/ > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane G > Sent: den 28 juli 2010 02:51 > To: LINUX-390@VM.MARIST.EDU > Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? > > Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone. > Smells like it's busy doing its own stuff. If it were me I'd want to know > trends for things like > swap-in and swap-out rates, tasks in uninterruptible sleep, context switch > counts. > > SAR is too granular to be any use even if it did have the data. Set up a > background script to > run top and vmstat and write to disk every so often. A quick bit of awk > should show the > trend. You could do all the probing of /proc yourself, but I find it easier > to allow things like > top/ps/vmstat do all the grunt work. > > Shane ... > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
and to make it somewhat easier to login :) start just a few servers at boot, then start one at a time until you see problems. Then it's time to stop that last one, and either increase memory or move the not started servers into one or more other Linux machines. ___ Tore Agblad Volvo Information Technology Infrastructure Mainframe Design & Development, Linux servers Dept 4352 DA1S SE-405 08, Gothenburg Sweden Telephone: +46-31-3233569 E-mail: tore.agb...@volvo.com http://www.volvo.com/volvoit/global/en-gb/ -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane G Sent: den 28 juli 2010 02:51 To: LINUX-390@VM.MARIST.EDU Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone. Smells like it's busy doing its own stuff. If it were me I'd want to know trends for things like swap-in and swap-out rates, tasks in uninterruptible sleep, context switch counts. SAR is too granular to be any use even if it did have the data. Set up a background script to run top and vmstat and write to disk every so often. A quick bit of awk should show the trend. You could do all the probing of /proc yourself, but I find it easier to allow things like top/ps/vmstat do all the grunt work. Shane ... -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Daniel Tate wrote: Unfortunately it did not. It continued to grow and grow until oom killed in a loop. We're supposed to have a max size of 512MB per VM. A max heap of 512 MB per VM doesn't mean that each VM will only use 512M, it means that it will grow to 512M + any extra memory required by the JVM itself. This can actually be quite large; I have JVMs that are half again as big as their heap in some cases[1]. Last time i did any WAS stuff was on Linux but on version 3 when it was new.. a long time ago. top - 14:04:46 up 2:09, 5 users, load average: 60.09, 18.03, 6.96 PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4520 wasadmin 20 0 1178m 576m 4460 S 107 7.2 11:59.54 java 8943 wasadmin 20 0 566m 254m 4372 S 15 3.2 2:11.86 java [snip] If you're going to use top to track down memory problems, please at least (a) sort by RES, which is what you care about, and (b) make sure top is recording all of the processes using significant memory. Your top display only shows a fraction of the 100+ processes, and is sorted by CPU use. Bit hard to guess what your culprits are from that. Something like ps -eao rss,pid,cmd | sort -n ...is probably more useful in determining where all your memory is going. It would be interesting to understand why one of your processes is running so hard on the CPU when the others are more modest. It may just be the workload, or it may be that trimming your heaps down has left it GC-thrashing, which will make your situation even worse. The fact you're deep in swap with Java processes, though, is going to destroy your performance. There's no way around that. Swap to VDISK, swap to physical disk, it doesn't matter. JVM + zLinux + zVM + Linux swapping is horrid for performance (zVM paging isn't much better). I have a couple of years of stress and production data that says so. [1] And yes, I mean real memory utilisation, not code segments and suchlike. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
You may have a memory leak in one or more of the java apps ? ___ Tore Agblad Volvo Information Technology Infrastructure Mainframe Design & Development, Linux servers Dept 4352 DA1S SE-405 08, Gothenburg Sweden Telephone: +46-31-3233569 E-mail: tore.agb...@volvo.com http://www.volvo.com/volvoit/global/en-gb/ -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: den 27 juli 2010 21:30 To: LINUX-390@VM.MARIST.EDU Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? Unfortunately it did not. It continued to grow and grow until oom killed in a loop. We're supposed to have a max size of 512MB per VM. Last time i did any WAS stuff was on Linux but on version 3 when it was new.. a long time ago. top - 14:04:46 up 2:09, 5 users, load average: 60.09, 18.03, 6.96 Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 30.9%us, 67.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 0.1%si, 1.8%st Mem: 8220492k total, 8183456k used,37036k free, 680k buffers Swap: 779880k total, 779880k used,0k free,54288k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4520 wasadmin 20 0 1178m 576m 4460 S 107 7.2 11:59.54 java 8943 wasadmin 20 0 566m 254m 4372 S 15 3.2 2:11.86 java 10326 wasadmin 20 0 581m 220m 4224 S 15 2.7 2:01.12 java 9678 wasadmin 20 0 563m 297m 20m S 15 3.7 2:11.54 java 4368 wasadmin 20 0 1148m 603m 2848 S 13 7.5 11:55.56 java 9321 wasadmin 20 0 597m 262m 4720 S 10 3.3 2:15.96 java 11107 wasadmin 20 0 555m 272m 4048 S 10 3.4 2:02.47 java 10706 wasadmin 20 0 509m 241m 2720 S9 3.0 2:05.35 java 4920 wasadmin 20 0 540m 228m 3296 S8 2.8 2:20.82 java 5298 wasadmin 20 0 586m 253m 2792 S8 3.2 2:06.52 java 7794 wasadmin 20 0 594m 307m 4644 S7 3.8 2:06.89 java 6021 wasadmin 20 0 542m 208m 2908 S5 2.6 1:53.18 java 6205 wasadmin 20 0 562m 226m 2980 S5 2.8 1:57.61 java 7265 wasadmin 20 0 517m 218m 2888 S4 2.7 1:51.77 java 9992 wasadmin 20 0 624m 223m 2896 S4 2.8 2:07.83 java 5481 wasadmin 20 0 707m 254m 4516 S3 3.2 2:28.43 java 6390 wasadmin 20 0 509m 167m 3672 S3 2.1 2:13.46 java 7516 wasadmin 20 0 531m 272m 2272 S3 3.4 2:05.57 java 11502 wasadmin 20 0 483m 207m 2772 S3 2.6 1:17.93 java 12248 wasadmin 20 0 520m 203m 3544 S3 2.5 1:22.76 java 6805 wasadmin 20 0 577m 254m 2980 S3 3.2 2:08.65 java 5657 wasadmin 20 0 572m 262m 22m S1 3.3 2:12.52 java 8615 wasadmin 20 0 574m 255m 2792 S1 3.2 2:12.48 java 5068 wasadmin 20 0 580m 243m 2560 S0 3.0 2:11.30 java 6586 wasadmin 20 0 580m 220m 3112 S0 2.8 2:08.78 java 7036 wasadmin 20 0 664m 244m 2504 S0 3.0 2:02.78 java 8066 wasadmin 20 0 578m 233m 3128 S0 2.9 1:58.66 java 11924 wasadmin 20 0 542m 203m 4780 S0 2.5 1:21.65 java 4052 wasadmin 20 0 9968 2624 1188 S0 0.0 0:00.15 bash 5008 wasadmin 20 0 6900 1748 1028 S0 0.0 0:23.92 top 5181 wasadmin 20 0 9972 2724 1192 S0 0.0 0:00.22 bash 5839 wasadmin 20 0 696m 217m 2716 S0 2.7 1:54.87 java 8347 wasadmin 20 0 528m 242m 22m S0 3.0 1:55.78 java On Tue, Jul 27, 2010 at 10:26 AM, Marcy Cortes wrote: > WAS calls non-heap memory "native". > That's what you seem to be using up. > Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312 > And if these things don't help, open a PMR with WAS support. > Did it stabilize at 18G? > The panic_on_oom is a good idea - you can then get a dump (set up a linux > dump volume if you are in a hurry - the VMDUMP command takes all day :) > > > Marcy > > "This message may contain confidential and/or privileged information. If you > are not the addressee or authorized to receive this for the addressee, you > must not use, copy, disclose, or take any action based on this message or any > information herein. If you have received this message in error, please advise > the sender immediately by reply e-mail and delete this message. Thank you for > your cooperation." > > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel > Tate > Sent: Tuesday, July 27, 2010 7:53 AM > To: LINUX-390@vm.marist.edu > Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning > problems? > > Yes, but no memory ever gets freed. The system stops responding and > is too busy dumping information to the console screen to respond to > anything at the console or network. > > And a system-wide panic would provide me
Re: OOM Condition on SLES11 running WAS - Tuning problems?
I think you don't need to set swappines to 0. It might be enough to have swap space smaller than heap size, making Linux very unwilling to swap that. I think yhey use this trick here in some servers. ___ Tore Agblad Volvo Information Technology Infrastructure Mainframe Design & Development, Linux servers Dept 4352 DA1S SE-405 08, Gothenburg Sweden Telephone: +46-31-3233569 E-mail: tore.agb...@volvo.com http://www.volvo.com/volvoit/global/en-gb/ -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: den 27 juli 2010 16:53 To: LINUX-390@VM.MARIST.EDU Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? Yes, but no memory ever gets freed. The system stops responding and is too busy dumping information to the console screen to respond to anything at the console or network. And a system-wide panic would provide me with a core file to analyze and send off; though the progress we've made thanks to this group has helped a bit (swappiness = 0) has helped it remain stable longer.. right now we're using 18GB of swap instead of the 35 we were before. i haven't tried panic_on_oom, but thats a good idea and i was braindead not to think of it. On Mon, Jul 26, 2010 at 8:48 PM, Shane G wrote: > Some more info please. > ... you get a OOM condition ?. > ... the/a large consumer gets killed ? > ... the system "halts" (explain) ?. > > You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to > "1" will > have the desired effect on non zLinux. > > Shane ... > > On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate wrote: > >> When websphere starts, it consumes all the memory eventually and >> halts, but not panics, the system. > ... >> At this point i see two problems: >> >> 1) Why is OOM Kill not functioning properly > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
I don't work directly with WAS here, but talk to the guys daily. I would check the heap size contra gc runs and the time that takes. It migh be much better to split into several smaller linux was servers with smaller memory, heap size and shorter time for gc runs. We have several WAS servers, and most of them has less than 2 GB. Usually we start at 1 or 1.2 GB and increase if needed, swap is two 64 MB vdisk and one physical 380 MB. Tuning heap size is critical for resp. time. With more WAS servers with smaller heap size, garbage collection runs will be much shorter, making total resp time better. It very much depends on the java code as well of course. /Tore ___ Tore Agblad Volvo Information Technology Infrastructure Mainframe Design & Development, Linux servers Dept 4352 DA1S SE-405 08, Gothenburg Sweden Telephone: +46-31-3233569 E-mail: tore.agb...@volvo.com http://www.volvo.com/volvoit/global/en-gb/ -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: den 27 juli 2010 00:57 To: LINUX-390@VM.MARIST.EDU Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? We set swappiness to 0 and started all servers. we will see if that helps and I will look into the sp level upgrade. Does anyone have any good tweeks for z websphere by chance? On Jul 26, 2010 4:59 PM, "Marcy Cortes" wrote: > Thanks for that clarification! > > > Marcy > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark Post > Sent: Monday, July 26, 2010 2:21 PM > To: LINUX-390@vm.marist.edu > Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? > >>>> On 7/26/2010 at 05:05 PM, Marcy Cortes wrote: >> I was going to suggest a dump and a ticket to Novell, but it looks like you >> aren't SP1, and so are unsupported. > > SLES11 GA is fully supported until 6 months after SP1 went GA. Even after that, NTS supports the product line throughout its life. We just can't get Level 3 to look at a problem after 6 months, which means bug fixes won't be created for a customer unless it's against SP1. > > > Mark Post > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone. Smells like it's busy doing its own stuff. If it were me I'd want to know trends for things like swap-in and swap-out rates, tasks in uninterruptible sleep, context switch counts. SAR is too granular to be any use even if it did have the data. Set up a background script to run top and vmstat and write to disk every so often. A quick bit of awk should show the trend. You could do all the probing of /proc yourself, but I find it easier to allow things like top/ps/vmstat do all the grunt work. Shane ... -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
The memory sizes here look different than you posted before. You have 8G of memory here (well 8220492k) but only 780M of swap. Your WAS heap sizes totaled around 7G if I recall (28 JVMs x 256M ). You can see what the running heap size is with "ps -ef | grep Xmx" and look for the value on the Xmx (i.e. -Xmx512m ) So, yeah, this time the OOM condition is expected. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: Tuesday, July 27, 2010 12:30 PM To: LINUX-390@vm.marist.edu Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? Unfortunately it did not. It continued to grow and grow until oom killed in a loop. We're supposed to have a max size of 512MB per VM. Last time i did any WAS stuff was on Linux but on version 3 when it was new.. a long time ago. top - 14:04:46 up 2:09, 5 users, load average: 60.09, 18.03, 6.96 Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 30.9%us, 67.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 0.1%si, 1.8%st Mem: 8220492k total, 8183456k used,37036k free, 680k buffers Swap: 779880k total, 779880k used,0k free,54288k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4520 wasadmin 20 0 1178m 576m 4460 S 107 7.2 11:59.54 java 8943 wasadmin 20 0 566m 254m 4372 S 15 3.2 2:11.86 java 10326 wasadmin 20 0 581m 220m 4224 S 15 2.7 2:01.12 java 9678 wasadmin 20 0 563m 297m 20m S 15 3.7 2:11.54 java 4368 wasadmin 20 0 1148m 603m 2848 S 13 7.5 11:55.56 java 9321 wasadmin 20 0 597m 262m 4720 S 10 3.3 2:15.96 java 11107 wasadmin 20 0 555m 272m 4048 S 10 3.4 2:02.47 java 10706 wasadmin 20 0 509m 241m 2720 S9 3.0 2:05.35 java 4920 wasadmin 20 0 540m 228m 3296 S8 2.8 2:20.82 java 5298 wasadmin 20 0 586m 253m 2792 S8 3.2 2:06.52 java 7794 wasadmin 20 0 594m 307m 4644 S7 3.8 2:06.89 java 6021 wasadmin 20 0 542m 208m 2908 S5 2.6 1:53.18 java 6205 wasadmin 20 0 562m 226m 2980 S5 2.8 1:57.61 java 7265 wasadmin 20 0 517m 218m 2888 S4 2.7 1:51.77 java 9992 wasadmin 20 0 624m 223m 2896 S4 2.8 2:07.83 java 5481 wasadmin 20 0 707m 254m 4516 S3 3.2 2:28.43 java 6390 wasadmin 20 0 509m 167m 3672 S3 2.1 2:13.46 java 7516 wasadmin 20 0 531m 272m 2272 S3 3.4 2:05.57 java 11502 wasadmin 20 0 483m 207m 2772 S3 2.6 1:17.93 java 12248 wasadmin 20 0 520m 203m 3544 S3 2.5 1:22.76 java 6805 wasadmin 20 0 577m 254m 2980 S3 3.2 2:08.65 java 5657 wasadmin 20 0 572m 262m 22m S1 3.3 2:12.52 java 8615 wasadmin 20 0 574m 255m 2792 S1 3.2 2:12.48 java 5068 wasadmin 20 0 580m 243m 2560 S0 3.0 2:11.30 java 6586 wasadmin 20 0 580m 220m 3112 S0 2.8 2:08.78 java 7036 wasadmin 20 0 664m 244m 2504 S0 3.0 2:02.78 java 8066 wasadmin 20 0 578m 233m 3128 S0 2.9 1:58.66 java 11924 wasadmin 20 0 542m 203m 4780 S0 2.5 1:21.65 java 4052 wasadmin 20 0 9968 2624 1188 S0 0.0 0:00.15 bash 5008 wasadmin 20 0 6900 1748 1028 S0 0.0 0:23.92 top 5181 wasadmin 20 0 9972 2724 1192 S0 0.0 0:00.22 bash 5839 wasadmin 20 0 696m 217m 2716 S0 2.7 1:54.87 java 8347 wasadmin 20 0 528m 242m 22m S0 3.0 1:55.78 java On Tue, Jul 27, 2010 at 10:26 AM, Marcy Cortes wrote: > WAS calls non-heap memory "native". > That's what you seem to be using up. > Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312 > And if these things don't help, open a PMR with WAS support. > Did it stabilize at 18G? > The panic_on_oom is a good idea - you can then get a dump (set up a linux > dump volume if you are in a hurry - the VMDUMP command takes all day :) > > > Marcy > > "This message may contain confidential and/or privileged information. If you > are not the addressee or authorized to receive this for the addressee, you > must not use, copy, disclose, or take any action based on this message or any > information herein. If you have received this message in error, please advise > the sender immediately by reply e-mail and delete this message. Thank you for > your cooperation." > > > -Original Message- > From: Linux on 390 Po
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Unfortunately it did not. It continued to grow and grow until oom killed in a loop. We're supposed to have a max size of 512MB per VM. Last time i did any WAS stuff was on Linux but on version 3 when it was new.. a long time ago. top - 14:04:46 up 2:09, 5 users, load average: 60.09, 18.03, 6.96 Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 30.9%us, 67.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 0.1%si, 1.8%st Mem: 8220492k total, 8183456k used,37036k free, 680k buffers Swap: 779880k total, 779880k used,0k free,54288k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4520 wasadmin 20 0 1178m 576m 4460 S 107 7.2 11:59.54 java 8943 wasadmin 20 0 566m 254m 4372 S 15 3.2 2:11.86 java 10326 wasadmin 20 0 581m 220m 4224 S 15 2.7 2:01.12 java 9678 wasadmin 20 0 563m 297m 20m S 15 3.7 2:11.54 java 4368 wasadmin 20 0 1148m 603m 2848 S 13 7.5 11:55.56 java 9321 wasadmin 20 0 597m 262m 4720 S 10 3.3 2:15.96 java 11107 wasadmin 20 0 555m 272m 4048 S 10 3.4 2:02.47 java 10706 wasadmin 20 0 509m 241m 2720 S9 3.0 2:05.35 java 4920 wasadmin 20 0 540m 228m 3296 S8 2.8 2:20.82 java 5298 wasadmin 20 0 586m 253m 2792 S8 3.2 2:06.52 java 7794 wasadmin 20 0 594m 307m 4644 S7 3.8 2:06.89 java 6021 wasadmin 20 0 542m 208m 2908 S5 2.6 1:53.18 java 6205 wasadmin 20 0 562m 226m 2980 S5 2.8 1:57.61 java 7265 wasadmin 20 0 517m 218m 2888 S4 2.7 1:51.77 java 9992 wasadmin 20 0 624m 223m 2896 S4 2.8 2:07.83 java 5481 wasadmin 20 0 707m 254m 4516 S3 3.2 2:28.43 java 6390 wasadmin 20 0 509m 167m 3672 S3 2.1 2:13.46 java 7516 wasadmin 20 0 531m 272m 2272 S3 3.4 2:05.57 java 11502 wasadmin 20 0 483m 207m 2772 S3 2.6 1:17.93 java 12248 wasadmin 20 0 520m 203m 3544 S3 2.5 1:22.76 java 6805 wasadmin 20 0 577m 254m 2980 S3 3.2 2:08.65 java 5657 wasadmin 20 0 572m 262m 22m S1 3.3 2:12.52 java 8615 wasadmin 20 0 574m 255m 2792 S1 3.2 2:12.48 java 5068 wasadmin 20 0 580m 243m 2560 S0 3.0 2:11.30 java 6586 wasadmin 20 0 580m 220m 3112 S0 2.8 2:08.78 java 7036 wasadmin 20 0 664m 244m 2504 S0 3.0 2:02.78 java 8066 wasadmin 20 0 578m 233m 3128 S0 2.9 1:58.66 java 11924 wasadmin 20 0 542m 203m 4780 S0 2.5 1:21.65 java 4052 wasadmin 20 0 9968 2624 1188 S0 0.0 0:00.15 bash 5008 wasadmin 20 0 6900 1748 1028 S0 0.0 0:23.92 top 5181 wasadmin 20 0 9972 2724 1192 S0 0.0 0:00.22 bash 5839 wasadmin 20 0 696m 217m 2716 S0 2.7 1:54.87 java 8347 wasadmin 20 0 528m 242m 22m S0 3.0 1:55.78 java On Tue, Jul 27, 2010 at 10:26 AM, Marcy Cortes wrote: > WAS calls non-heap memory "native". > That's what you seem to be using up. > Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312 > And if these things don't help, open a PMR with WAS support. > Did it stabilize at 18G? > The panic_on_oom is a good idea - you can then get a dump (set up a linux > dump volume if you are in a hurry - the VMDUMP command takes all day :) > > > Marcy > > "This message may contain confidential and/or privileged information. If you > are not the addressee or authorized to receive this for the addressee, you > must not use, copy, disclose, or take any action based on this message or any > information herein. If you have received this message in error, please advise > the sender immediately by reply e-mail and delete this message. Thank you for > your cooperation." > > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel > Tate > Sent: Tuesday, July 27, 2010 7:53 AM > To: LINUX-390@vm.marist.edu > Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning > problems? > > Yes, but no memory ever gets freed. The system stops responding and > is too busy dumping information to the console screen to respond to > anything at the console or network. > > And a system-wide panic would provide me with a core file to analyze > and send off; though the progress we've made thanks to this group has > helped a bit (swappiness = 0) has helped it remain stable longer.. > right now we're using 18GB of swap instead of the 35 we were before. > i haven't tried panic_on_oom, but thats a good idea and i was > braindead not to think of it. > > On Mon, Jul 26, 2010 at 8:48 PM, Shane G wrote: >> Some more info please. >> ... you get a OOM condition ?. >> ... the/a large consumer gets killed ? >> ... the system "halts" (expla
Re: OOM Condition on SLES11 running WAS - Tuning problems?
WAS calls non-heap memory "native". That's what you seem to be using up. Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312 And if these things don't help, open a PMR with WAS support. Did it stabilize at 18G? The panic_on_oom is a good idea - you can then get a dump (set up a linux dump volume if you are in a hurry - the VMDUMP command takes all day :) Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: Tuesday, July 27, 2010 7:53 AM To: LINUX-390@vm.marist.edu Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? Yes, but no memory ever gets freed. The system stops responding and is too busy dumping information to the console screen to respond to anything at the console or network. And a system-wide panic would provide me with a core file to analyze and send off; though the progress we've made thanks to this group has helped a bit (swappiness = 0) has helped it remain stable longer.. right now we're using 18GB of swap instead of the 35 we were before. i haven't tried panic_on_oom, but thats a good idea and i was braindead not to think of it. On Mon, Jul 26, 2010 at 8:48 PM, Shane G wrote: > Some more info please. > ... you get a OOM condition ?. > ... the/a large consumer gets killed ? > ... the system "halts" (explain) ?. > > You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to > "1" will > have the desired effect on non zLinux. > > Shane ... > > On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate wrote: > >> When websphere starts, it consumes all the memory eventually and >> halts, but not panics, the system. > ... >> At this point i see two problems: >> >> 1) Why is OOM Kill not functioning properly > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Yes, but no memory ever gets freed. The system stops responding and is too busy dumping information to the console screen to respond to anything at the console or network. And a system-wide panic would provide me with a core file to analyze and send off; though the progress we've made thanks to this group has helped a bit (swappiness = 0) has helped it remain stable longer.. right now we're using 18GB of swap instead of the 35 we were before. i haven't tried panic_on_oom, but thats a good idea and i was braindead not to think of it. On Mon, Jul 26, 2010 at 8:48 PM, Shane G wrote: > Some more info please. > ... you get a OOM condition ?. > ... the/a large consumer gets killed ? > ... the system "halts" (explain) ?. > > You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to > "1" will > have the desired effect on non zLinux. > > Shane ... > > On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate wrote: > >> When websphere starts, it consumes all the memory eventually and >> halts, but not panics, the system. > ... >> At this point i see two problems: >> >> 1) Why is OOM Kill not functioning properly > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
On Mon, 26 Jul 2010 10:24:06 -0500, Daniel Tate wrote: > At this point i see two problems: > > 1) Why is OOM Kill not functioning properly > 2) Why is websphere performance so awful? If you run WAS (or any Java app) in swap, performance will be terrible, as a rule of thumb; if you've got 16GB allocated to your JVM heaps (for example), and 10GB for the guest, you're going to see terrible performance sooner or later. (A secondary issue is that Z9 processors are, in my experience, much slower than a modern x86 processor, so if you're expecting a Z9 IFL to offer comparable performance to a Xeon, you are going to be very sadly disappointed.) As far as OOM goes I've seen this situation on zVM guests, usually with Oracle, in the past, and put it down to the VDISK swap being just fast enough that the OOM killer is never triggered, even though the system is unusable. That's just theory on my part, though. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Some more info please. ... you get a OOM condition ?. ... the/a large consumer gets killed ? ... the system "halts" (explain) ?. You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to "1" will have the desired effect on non zLinux. Shane ... On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate wrote: > When websphere starts, it consumes all the memory eventually and > halts, but not panics, the system. ... > At this point i see two problems: > > 1) Why is OOM Kill not functioning properly -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
: slabdata 0 0 0 size-2048170170 204821 : tunables 24 12 8 : slabdata 85 85 0 size-1024(DMA)19 20 102441 : tunables 54 27 8 : slabdata 5 5 0 size-1024893928 102441 : tunables 54 27 8 : slabdata232232 0 size-512(DMA) 18 3251281 : tunables 54 27 8 : slabdata 4 4 0 size-512 57657651281 : tunables 54 27 8 : slabdata 72 72 0 size-256(DMA) 3 15256 151 : tunables 120 60 8 : slabdata 1 1 0 size-256 16444 16455256 151 : tunables 120 60 8 : slabdata 1097 1097 0 size-128(DMA) 0 0128 301 : tunables 120 60 8 : slabdata 0 0 0 size-64(DMA) 23 59 64 591 : tunables 120 60 8 : slabdata 1 1 0 size-64 3532 6608 64 591 : tunables 120 60 8 : slabdata112112 0 size-32(DMA) 4112 32 1121 : tunables 120 60 8 : slabdata 1 1 0 size-128 942 1110128 301 : tunables 120 60 8 : slabdata 37 37 0 size-32 3261 3360 32 1121 : tunables 120 60 8 : slabdata 30 30 0 kmem_cache 15315576851 : tunables 54 27 8 : slabdata 31 31 0 On Mon, Jul 26, 2010 at 6:17 PM, Marcy Cortes wrote: > Do you run at all before you run out of memory? Have you tried just starting > one of the apps at a time and seeing what each seems to do to memory usage? > Could it be that one of them has a leak? > > I think all the tweaks are really app dependent. You'll have to measure with > both a good VM performance monitor and a good java monitor. > Our largest app finds gencon garbage collection much more efficient (uses > more than 10% less CPU for them). Another found that using Async i/o in WAS > exhibited native memory leak symptoms... But other than that, we don't have > specific tweeks that we do for every app. > > > Marcy > "This message may contain confidential and/or privileged information. If you > are not the addressee or authorized to receive this for the addressee, you > must not use, copy, disclose, or take any action based on this message or any > information herein. If you have received this message in error, please advise > the sender immediately by reply e-mail and delete this message. Thank you for > your cooperation." > > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel > Tate > Sent: Monday, July 26, 2010 3:57 PM > To: LINUX-390@vm.marist.edu > Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning > problems? > > We set swappiness to 0 and started all servers. we will see if that helps > and I will look into the sp level upgrade. Does anyone have any good tweeks > for z websphere by chance? > > On Jul 26, 2010 4:59 PM, "Marcy Cortes" > wrote: >> Thanks for that clarification! >> >> >> Marcy >> >> -Original Message- >> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark > Post >> Sent: Monday, July 26, 2010 2:21 PM >> To: LINUX-390@vm.marist.edu >> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning > problems? >> >>>>> On 7/26/2010 at 05:05 PM, Marcy Cortes > wrote: >>> I was going to suggest a dump and a ticket to Novell, but it looks like > you >>> aren't SP1, and so are unsupported. >> >> SLES11 GA is fully supported until 6 months after SP1 went GA. Even after > that, NTS supports the product line throughout its life. We just can't get > Level 3 to look at a problem after 6 months, which means bug fixes won't be > created for a customer unless it's against SP1. >> >> >> Mark Post >> >> -- >> For LINUX-390 subscribe / signoff / archive access instructions, >> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or > visit >> http://www.marist.edu/htbin/wlvindex?LINUX-390 >> -- >> For more information on Linux on System z, visit >> http://wiki.linuxvm.org/ >> >> -- >> For LINUX-390 subscribe / signoff / archive access instructions, >> send email to lists...@vm.marist.edu with the message: INFO LINUX-390
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Hello I see you are you running 64bit WAS on Linux, are you running the 32bit version on windows? How close is the memory utilization on the windows machine? regards Phil Tully On 7/26/2010 12:07 PM, Marcy Cortes wrote: First of all, you've run out of memory on that server (Swap: 35764956k total, 35764956k used,) It ate all of the 10G and all of the 35G of swap. How many JVM's are running and what are their min/max heap sizes? Marcy “This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: Monday, July 26, 2010 8:24 AM To: LINUX-390@vm.marist.edu Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? We're running websphere on a z9 under z/VM 4 systems are live out of 8. it is running apps that consume around 16GB of memory on a Windows machine. on this, we have allocated 10G of real storage (RAM) and around 35GB of Swap.When websphere starts, it consumes all the memory eventually and halts, but not panics, the system.We are running 64-Bit. I'm a z/VM novice so i don't know much to do.. Here is some information from our WAS Admin: "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 installed. There are two nodes running 14 application servers each. there are currently 32 applications installed but not currently running. No security has been enabled for WebSphere at this time." At this point i see two problems: 1) Why is OOM Kill not functioning properly 2) Why is websphere performance so awful? and have two questions 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on z/VM? So far we've been using dated case studies and redbooks that seem to be filled with inaccuracies or outdated information. 2) Is there any way to force a coredump via the cp, like you can with the magic sysrq? All systems are running the same release and patch level: [root] bwzld001:~# lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x Distributor ID:SUSE LINUX Description:SUSE Linux Enterprise Server 11 (s390x) Release:11 Codename:n/a Here is a partial top shortly before system death: top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, 10.25 Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, 1.2%st Mem: 10268344k total, 10220568k used,47776k free, 548k buffers Swap: 35764956k total, 35764956k used,0k free,56340k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 java 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 java 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 java 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 java 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 java 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 java 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 java 25609 wasadmin 20 0 1528m 219m 2540 S9 2.2 16:07.33 java 30258 wasadmin 20 0 1515m 249m 2592 S7 2.5 15:49.79 java 25780 wasadmin 20 0 1604m 277m 2332 S6 2.8 16:31.41 java 27106 wasadmin 20 0 1458m 273m 2472 S6 2.7 15:59.13 java 27336 wasadmin 20 0 1528m 238m 2540 S5 2.4 15:38.82 java 29164 wasadmin 20 0 1527m 224m 2608 S5 2.2 16:02.56 java 31400 wasadmin 20 0 1509m 259m 2468 S5 2.6 15:26.38 java 25244 wasadmin 20 0 1509m 290m 2624 S5 2.9 16:16.07 java 24769 wasadmin 20 0 1409m 259m 2308 S5 2.6 16:08.12 java 28796 wasadmin 20 0 1338m 263m 3076 S4 2.6 15:47.72 java 26185 wasadmin 20 0 1493m 274m 2304 S2 2.7 16:01.97 java 25968 wasadmin 20 0 1427m 257m 2532 S1 2.6 15:51.50 java 29495 wasadmin 20 0 1466m 259m 2260 S1 2.6 15:31.82 java 25080 wasadmin 20 0 1445m 236m 2472 S0 2.4 15:53.19 java 26410 wasadmin 20 0 1475m 271m 2540 S0 2.7 15:52.48 java 31027 wasadmin 20 0 1413m 238m 2492 S0 2.4 15:29.78 java 3695 wasadmin 20 0 9968 1352 1352 S0 0.0 0:00.13 bash 24474 wasadmin 20 0 1468m 205m 2472 S0 2.0 16
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Do you run at all before you run out of memory? Have you tried just starting one of the apps at a time and seeing what each seems to do to memory usage? Could it be that one of them has a leak? I think all the tweaks are really app dependent. You'll have to measure with both a good VM performance monitor and a good java monitor. Our largest app finds gencon garbage collection much more efficient (uses more than 10% less CPU for them). Another found that using Async i/o in WAS exhibited native memory leak symptoms... But other than that, we don't have specific tweeks that we do for every app. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: Monday, July 26, 2010 3:57 PM To: LINUX-390@vm.marist.edu Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? We set swappiness to 0 and started all servers. we will see if that helps and I will look into the sp level upgrade. Does anyone have any good tweeks for z websphere by chance? On Jul 26, 2010 4:59 PM, "Marcy Cortes" wrote: > Thanks for that clarification! > > > Marcy > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark Post > Sent: Monday, July 26, 2010 2:21 PM > To: LINUX-390@vm.marist.edu > Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? > >>>> On 7/26/2010 at 05:05 PM, Marcy Cortes wrote: >> I was going to suggest a dump and a ticket to Novell, but it looks like you >> aren't SP1, and so are unsupported. > > SLES11 GA is fully supported until 6 months after SP1 went GA. Even after that, NTS supports the product line throughout its life. We just can't get Level 3 to look at a problem after 6 months, which means bug fixes won't be created for a customer unless it's against SP1. > > > Mark Post > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
We set swappiness to 0 and started all servers. we will see if that helps and I will look into the sp level upgrade. Does anyone have any good tweeks for z websphere by chance? On Jul 26, 2010 4:59 PM, "Marcy Cortes" wrote: > Thanks for that clarification! > > > Marcy > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark Post > Sent: Monday, July 26, 2010 2:21 PM > To: LINUX-390@vm.marist.edu > Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? > >>>> On 7/26/2010 at 05:05 PM, Marcy Cortes wrote: >> I was going to suggest a dump and a ticket to Novell, but it looks like you >> aren't SP1, and so are unsupported. > > SLES11 GA is fully supported until 6 months after SP1 went GA. Even after that, NTS supports the product line throughout its life. We just can't get Level 3 to look at a problem after 6 months, which means bug fixes won't be created for a customer unless it's against SP1. > > > Mark Post > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > -- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Thanks for that clarification! Marcy -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark Post Sent: Monday, July 26, 2010 2:21 PM To: LINUX-390@vm.marist.edu Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? >>> On 7/26/2010 at 05:05 PM, Marcy Cortes >>> wrote: > I was going to suggest a dump and a ticket to Novell, but it looks like you > aren't SP1, and so are unsupported. SLES11 GA is fully supported until 6 months after SP1 went GA. Even after that, NTS supports the product line throughout its life. We just can't get Level 3 to look at a problem after 6 months, which means bug fixes won't be created for a customer unless it's against SP1. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
>>> On 7/26/2010 at 05:05 PM, Marcy Cortes >>> wrote: > I was going to suggest a dump and a ticket to Novell, but it looks like you > aren't SP1, and so are unsupported. SLES11 GA is fully supported until 6 months after SP1 went GA. Even after that, NTS supports the product line throughout its life. We just can't get Level 3 to look at a problem after 6 months, which means bug fixes won't be created for a customer unless it's against SP1. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: OOM Condition on SLES11 running WAS - Tuning problems?
I was going to suggest a dump and a ticket to Novell, but it looks like you aren't SP1, and so are unsupported. Anyway you could apply that and try again? Marcy “This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: Monday, July 26, 2010 11:28 AM To: LINUX-390@vm.marist.edu Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? Yeah, i saw that.. problem is these same apps run on 16GB of mem on a windows box.. We have 28 JVMs and sizes are set to 50/256. On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes < marcy.d.cor...@wellsfargo.com> wrote: > First of all, you've run out of memory on that server (Swap: 35764956k > total, 35764956k used,) > It ate all of the 10G and all of the 35G of swap. > How many JVM's are running and what are their min/max heap sizes? > > > > Marcy > > “This message may contain confidential and/or privileged information. If > you are not the addressee or authorized to receive this for the addressee, > you must not use, copy, disclose, or take any action based on this message > or any information herein. If you have received this message in error, > please advise the sender immediately by reply e-mail and delete this > message. Thank you for your cooperation." > > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of > Daniel Tate > Sent: Monday, July 26, 2010 8:24 AM > To: LINUX-390@vm.marist.edu > Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? > > We're running websphere on a z9 under z/VM 4 systems are live out of 8. > it > is running apps that consume around 16GB of memory on a Windows machine. > on > this, we have allocated 10G of real storage (RAM) and around 35GB of > Swap.When websphere starts, it consumes all the memory eventually and > halts, but not panics, the system.We are running 64-Bit. I'm a z/VM > novice so i don't know much to do.. > > Here is some information from our WAS Admin: > "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 > installed. There are two nodes running 14 application servers each. there > are currently 32 applications installed but not currently running. No > security has been enabled for WebSphere at this time." > > > At this point i see two problems: > > 1) Why is OOM Kill not functioning properly > 2) Why is websphere performance so awful? > > and have two questions > > 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on > z/VM? So far we've been using dated case studies and redbooks that seem to > be filled with inaccuracies or outdated information. > 2) Is there any way to force a coredump via the cp, like you can with the > magic sysrq? > > All systems are running the same release and patch level: > > [root] bwzld001:~# lsb_release -a > LSB Version: > > core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x > Distributor ID:SUSE LINUX > Description:SUSE Linux Enterprise Server 11 (s390x) > Release:11 > Codename:n/a > > > Here is a partial top shortly before system death: > > top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, > 10.25 > Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie > Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, > 1.2%st > Mem: 10268344k total, 10220568k used,47776k free, 548k buffers > Swap: 35764956k total, 35764956k used,0k free,56340k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ > COMMAND > > 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 > java > 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 > java > 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 > java > 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 > java > 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 > java > 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Set swappiness to 0. Can you just start 1 node as a test? Ray > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On > Behalf Of Daniel Tate > Sent: Monday, July 26, 2010 2:28 PM > To: LINUX-390@VM.MARIST.EDU > Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems? > > Yeah, i saw that.. problem is these same apps run on 16GB of mem on a > windows box.. > > We have 28 JVMs and sizes are set to 50/256. > > On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes < > marcy.d.cor...@wellsfargo.com> wrote: > > > First of all, you've run out of memory on that server > (Swap: 35764956k > > total, 35764956k used,) > > It ate all of the 10G and all of the 35G of swap. > > How many JVM's are running and what are their min/max heap sizes? > > > > > > > > Marcy > > > > “This message may contain confidential and/or privileged > information. If > > you are not the addressee or authorized to receive this for > the addressee, > > you must not use, copy, disclose, or take any action based > on this message > > or any information herein. If you have received this > message in error, > > please advise the sender immediately by reply e-mail and delete this > > message. Thank you for your cooperation." > > > > > > -Original Message- > > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On > Behalf Of > > Daniel Tate > > Sent: Monday, July 26, 2010 8:24 AM > > To: LINUX-390@vm.marist.edu > > Subject: [LINUX-390] OOM Condition on SLES11 running WAS - > Tuning problems? > > > > We're running websphere on a z9 under z/VM 4 systems are > live out of 8. > > it > > is running apps that consume around 16GB of memory on a > Windows machine. > > on > > this, we have allocated 10G of real storage (RAM) and around 35GB of > > Swap.When websphere starts, it consumes all the memory > eventually and > > halts, but not panics, the system.We are running > 64-Bit. I'm a z/VM > > novice so i don't know much to do.. > > > > Here is some information from our WAS Admin: > > "We are running WebSphere 6.1.0.25 with FP > EJB3.0,Webservices and Web 2.0 > > installed. There are two nodes running 14 application > servers each. there > > are currently 32 applications installed but not currently > running. No > > security has been enabled for WebSphere at this time." > > > > > > At this point i see two problems: > > > > 1) Why is OOM Kill not functioning properly > > 2) Why is websphere performance so awful? > > > > and have two questions > > > > 1) Does anyone have any PRACTICAL experience/tips to > optimize SLES11 on > > z/VM? So far we've been using dated case studies and > redbooks that seem to > > be filled with inaccuracies or outdated information. > > 2) Is there any way to force a coredump via the cp, like > you can with the > > magic sysrq? > > > > All systems are running the same release and patch level: > > > > [root] bwzld001:~# lsb_release -a > > LSB Version: > > > > > core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x :core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-> s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:g raphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-> s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390: graphics-4.0-s390x > > Distributor ID:SUSE LINUX > > Description:SUSE Linux Enterprise Server 11 (s390x) > > Release:11 > > Codename:n/a > > > > > > Here is a partial top shortly before system death: > > > > top - 08:13:14 up 2 days, 16:08, 2 users, load average: > 51.47, 22.20, > > 10.25 > > Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie > > Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, > 0.3%hi, 0.3%si, > > 1.2%st > > Mem: 10268344k total, 10220568k used,47776k free, > 548k buffers > > Swap: 35764956k total, 35764956k used,0k free, > 56340k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ > > COMMAND > > > > 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 > > java > > 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 > > java > > 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 > > java > > 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 > > java > > 26606 wasadmin 20 0 1438m 272m 6212 S 12
Re: OOM Condition on SLES11 running WAS - Tuning problems?
Yeah, i saw that.. problem is these same apps run on 16GB of mem on a windows box.. We have 28 JVMs and sizes are set to 50/256. On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes < marcy.d.cor...@wellsfargo.com> wrote: > First of all, you've run out of memory on that server (Swap: 35764956k > total, 35764956k used,) > It ate all of the 10G and all of the 35G of swap. > How many JVM's are running and what are their min/max heap sizes? > > > > Marcy > > “This message may contain confidential and/or privileged information. If > you are not the addressee or authorized to receive this for the addressee, > you must not use, copy, disclose, or take any action based on this message > or any information herein. If you have received this message in error, > please advise the sender immediately by reply e-mail and delete this > message. Thank you for your cooperation." > > > -Original Message- > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of > Daniel Tate > Sent: Monday, July 26, 2010 8:24 AM > To: LINUX-390@vm.marist.edu > Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? > > We're running websphere on a z9 under z/VM 4 systems are live out of 8. > it > is running apps that consume around 16GB of memory on a Windows machine. > on > this, we have allocated 10G of real storage (RAM) and around 35GB of > Swap.When websphere starts, it consumes all the memory eventually and > halts, but not panics, the system.We are running 64-Bit. I'm a z/VM > novice so i don't know much to do.. > > Here is some information from our WAS Admin: > "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 > installed. There are two nodes running 14 application servers each. there > are currently 32 applications installed but not currently running. No > security has been enabled for WebSphere at this time." > > > At this point i see two problems: > > 1) Why is OOM Kill not functioning properly > 2) Why is websphere performance so awful? > > and have two questions > > 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on > z/VM? So far we've been using dated case studies and redbooks that seem to > be filled with inaccuracies or outdated information. > 2) Is there any way to force a coredump via the cp, like you can with the > magic sysrq? > > All systems are running the same release and patch level: > > [root] bwzld001:~# lsb_release -a > LSB Version: > > core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x > Distributor ID:SUSE LINUX > Description:SUSE Linux Enterprise Server 11 (s390x) > Release:11 > Codename:n/a > > > Here is a partial top shortly before system death: > > top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, > 10.25 > Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie > Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, > 1.2%st > Mem: 10268344k total, 10220568k used,47776k free, 548k buffers > Swap: 35764956k total, 35764956k used,0k free,56340k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ > COMMAND > > 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 > java > 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 > java > 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 > java > 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 > java > 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 > java > 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 > java > 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 > java > 25609 wasadmin 20 0 1528m 219m 2540 S9 2.2 16:07.33 > java > 30258 wasadmin 20 0 1515m 249m 2592 S7 2.5 15:49.79 > java > 25780 wasadmin 20 0 1604m 277m 2332 S6 2.8 16:31.41 > java > 27106 wasadmin 20 0 1458m 273m 2472 S6 2.7 15:59.13 > java > 27336 wasadmin 20 0 1528m 238m 2540 S5 2.4 15:38.82 > java > 29164 wasadmin 20 0 1527m 224m 2608 S5 2.2 16:02.56 > java > 31400 wasadmin 20 0 1509m 259m 2468 S5 2.6 15:26.38 > java > 25244 wasadmin 20 0 1509m 290m 2624 S5 2.9 16:16.07 > java > 24769 wasadmin 20 0 1409m 259m 2308 S5 2.6 16:08.12 > java > 28796 wasadmin 20 0 1338m 263m 3076 S4 2.6 15:47.
Re: OOM Condition on SLES11 running WAS - Tuning problems?
First of all, you've run out of memory on that server (Swap: 35764956k total, 35764956k used,) It ate all of the 10G and all of the 35G of swap. How many JVM's are running and what are their min/max heap sizes? Marcy “This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel Tate Sent: Monday, July 26, 2010 8:24 AM To: LINUX-390@vm.marist.edu Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems? We're running websphere on a z9 under z/VM 4 systems are live out of 8. it is running apps that consume around 16GB of memory on a Windows machine. on this, we have allocated 10G of real storage (RAM) and around 35GB of Swap.When websphere starts, it consumes all the memory eventually and halts, but not panics, the system.We are running 64-Bit. I'm a z/VM novice so i don't know much to do.. Here is some information from our WAS Admin: "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 installed. There are two nodes running 14 application servers each. there are currently 32 applications installed but not currently running. No security has been enabled for WebSphere at this time." At this point i see two problems: 1) Why is OOM Kill not functioning properly 2) Why is websphere performance so awful? and have two questions 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on z/VM? So far we've been using dated case studies and redbooks that seem to be filled with inaccuracies or outdated information. 2) Is there any way to force a coredump via the cp, like you can with the magic sysrq? All systems are running the same release and patch level: [root] bwzld001:~# lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x Distributor ID:SUSE LINUX Description:SUSE Linux Enterprise Server 11 (s390x) Release:11 Codename:n/a Here is a partial top shortly before system death: top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, 10.25 Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, 1.2%st Mem: 10268344k total, 10220568k used,47776k free, 548k buffers Swap: 35764956k total, 35764956k used,0k free,56340k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 java 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 java 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 java 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 java 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 java 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 java 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 java 25609 wasadmin 20 0 1528m 219m 2540 S9 2.2 16:07.33 java 30258 wasadmin 20 0 1515m 249m 2592 S7 2.5 15:49.79 java 25780 wasadmin 20 0 1604m 277m 2332 S6 2.8 16:31.41 java 27106 wasadmin 20 0 1458m 273m 2472 S6 2.7 15:59.13 java 27336 wasadmin 20 0 1528m 238m 2540 S5 2.4 15:38.82 java 29164 wasadmin 20 0 1527m 224m 2608 S5 2.2 16:02.56 java 31400 wasadmin 20 0 1509m 259m 2468 S5 2.6 15:26.38 java 25244 wasadmin 20 0 1509m 290m 2624 S5 2.9 16:16.07 java 24769 wasadmin 20 0 1409m 259m 2308 S5 2.6 16:08.12 java 28796 wasadmin 20 0 1338m 263m 3076 S4 2.6 15:47.72 java 26185 wasadmin 20 0 1493m 274m 2304 S2 2.7 16:01.97 java 25968 wasadmin 20 0 1427m 257m 2532 S1 2.6 15:51.50 java 29495 wasadmin 20 0 1466m 259m 2260 S1 2.6 15:31.82 java 25080 wasadmin 20 0 1445m 236m 2472 S0 2.4 15:53.19 java 26410 wasadmin 20 0 1475m 271m 2540 S0 2.7 15:52.48 java 31027 wasadmin 20 0 1413m 238m 2492 S0 2.4 15:29.78 java 3695 wasadmin 20 0 9968 1352 1352 S0 0.0 0:00.13 bash 24474 wasadmin 20 0 1468m 205m 2472 S0 2.0 16:03.63 java 24920 wasadmin 20 0 1522m 263m 2616 S0 2.6 16:06.29 java 25422 wasadmin 20 0 1584m 229m 2284 S0 2.3 16:02.18 java 27892 wasadmin 20 0 1414m 263m 2648 S0 2.6 15:45.96 java 28184 wasadmin
OOM Condition on SLES11 running WAS - Tuning problems?
We're running websphere on a z9 under z/VM 4 systems are live out of 8. it is running apps that consume around 16GB of memory on a Windows machine. on this, we have allocated 10G of real storage (RAM) and around 35GB of Swap.When websphere starts, it consumes all the memory eventually and halts, but not panics, the system.We are running 64-Bit. I'm a z/VM novice so i don't know much to do.. Here is some information from our WAS Admin: "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0 installed. There are two nodes running 14 application servers each. there are currently 32 applications installed but not currently running. No security has been enabled for WebSphere at this time." At this point i see two problems: 1) Why is OOM Kill not functioning properly 2) Why is websphere performance so awful? and have two questions 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on z/VM? So far we've been using dated case studies and redbooks that seem to be filled with inaccuracies or outdated information. 2) Is there any way to force a coredump via the cp, like you can with the magic sysrq? All systems are running the same release and patch level: [root] bwzld001:~# lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x Distributor ID:SUSE LINUX Description:SUSE Linux Enterprise Server 11 (s390x) Release:11 Codename:n/a Here is a partial top shortly before system death: top - 08:13:14 up 2 days, 16:08, 2 users, load average: 51.47, 22.20, 10.25 Tasks: 129 total, 4 running, 125 sleeping, 0 stopped, 0 zombie Cpu(s): 16.7%us, 81.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.3%si, 1.2%st Mem: 10268344k total, 10220568k used,47776k free, 548k buffers Swap: 35764956k total, 35764956k used,0k free,56340k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 26850 wasadmin 20 0 1506m 253m 2860 S 18 2.5 16:06.28 java 29870 wasadmin 20 0 1497m 279m 2560 S 15 2.8 15:41.13 java 24607 wasadmin 20 0 1502m 223m 2760 S 13 2.2 16:15.14 java 24641 wasadmin 20 0 7229m 1.3g 3172 S 13 13.1 196:35.52 java 26606 wasadmin 20 0 1438m 272m 6212 S 12 2.7 16:02.77 java 27600 wasadmin 20 0 1553m 258m 2920 S 12 2.6 15:46.57 java 24638 wasadmin 20 0 7368m 1.3g 24m S 10 13.7 206:02.05 java 25609 wasadmin 20 0 1528m 219m 2540 S9 2.2 16:07.33 java 30258 wasadmin 20 0 1515m 249m 2592 S7 2.5 15:49.79 java 25780 wasadmin 20 0 1604m 277m 2332 S6 2.8 16:31.41 java 27106 wasadmin 20 0 1458m 273m 2472 S6 2.7 15:59.13 java 27336 wasadmin 20 0 1528m 238m 2540 S5 2.4 15:38.82 java 29164 wasadmin 20 0 1527m 224m 2608 S5 2.2 16:02.56 java 31400 wasadmin 20 0 1509m 259m 2468 S5 2.6 15:26.38 java 25244 wasadmin 20 0 1509m 290m 2624 S5 2.9 16:16.07 java 24769 wasadmin 20 0 1409m 259m 2308 S5 2.6 16:08.12 java 28796 wasadmin 20 0 1338m 263m 3076 S4 2.6 15:47.72 java 26185 wasadmin 20 0 1493m 274m 2304 S2 2.7 16:01.97 java 25968 wasadmin 20 0 1427m 257m 2532 S1 2.6 15:51.50 java 29495 wasadmin 20 0 1466m 259m 2260 S1 2.6 15:31.82 java 25080 wasadmin 20 0 1445m 236m 2472 S0 2.4 15:53.19 java 26410 wasadmin 20 0 1475m 271m 2540 S0 2.7 15:52.48 java 31027 wasadmin 20 0 1413m 238m 2492 S0 2.4 15:29.78 java 3695 wasadmin 20 0 9968 1352 1352 S0 0.0 0:00.13 bash 24474 wasadmin 20 0 1468m 205m 2472 S0 2.0 16:03.63 java 24920 wasadmin 20 0 1522m 263m 2616 S0 2.6 16:06.29 java 25422 wasadmin 20 0 1584m 229m 2284 S0 2.3 16:02.18 java 27892 wasadmin 20 0 1414m 263m 2648 S0 2.6 15:45.96 java 28184 wasadmin 20 0 1523m 241m 2320 S0 2.4 15:42.21 java 28486 wasadmin 20 0 1450m 231m 2288 S0 2.3 15:46.53 java 30625 wasadmin 20 0 1477m 251m 3024 S0 2.5 15:44.80 java - Here are a few screen grabs from the 3720 Console session: Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7, pflags:0x400 040 CPU: 1 Not tainted 2.6.27.45-0.1-default #1 Process java (pid: 28831, task: 0001ab64c638, ksp: 000215bbb5e0) 00027fbcf7b0 0002 00027fbcf850 00027fbcf7c8 00027fbcf7c8 003b6696 014a4e88 0007 00634e00 000d 00027fbcf818 0