Re: OOM Condition on SLES11 running WAS - Tuning problems? - Possible Solution

2010-08-18 Thread Daniel Tate
We disabled our LDAP SSO after i noticed on an strace -f -p [pid] of
the nodeagent that all spawned threads were making LDAP Calls to the
internal authentication server.  This opens up other questions, like
why would LDAP cause a memory leak in the raw (no applications)
Websphere installation.. but at least the memory leak seems to have
been solved.

On Wed, Aug 11, 2010 at 2:01 PM, Daniel Tate  wrote:
> Has anyone gotten this to successfully work on SLES11?  Other versions
> of SLES?  Redhat?
>
> Here is a description from the websphere admin of the problem from his side:
>
>
>
> We are running 6.1.0.25 ND on Zlinux (SLES 11). We have 2 nodes created
>
> on a server with 14 application servers per node. All jvm sizes are set
>
> to default 50/256. No applications are installed.
>
> The server has 10GB of real memory and 30GB of swap memory. We start
>
> the node agents up, then start all applications servers. Once
>
> everything is up, approximately 7GB of memory is used.
>
> With the server sitting idle, the memory used by the node agents will
>
> continue to grow to well over 1GB. Memory will be continuously used
>
> until all real memory and swap memory is consumed. At this point the
>
> server becomes unavailable and eventually kills off all java processes
>
> on the machine. No heap dumps have ever been generated.
>
>
> On Wed, Aug 11, 2010 at 1:27 PM, Daniel Tate  wrote:
>> It's the nodeagents that are consuming the memory.  The weird thing is
>> RSS and VIRT are both larger than what WAS shows in the console.. and
>> VIRT grows even though no swapping was active.
>>
>> We simply cannot explain the action.. We have PMR's open for every
>> possible avenue here.    So i'll ask the question another way:
>>
>> Does anyone have WebSphere (Network Deployment edition) running
>> successfully on s390x, SLES11 GA?
>>
>>
>> On Wed, Jul 28, 2010 at 9:05 AM, Agblad Tore  wrote:
>>> and to make it somewhat easier to login :)
>>> start just a few servers at boot, then start one at a time until
>>> you see problems.
>>> Then it's time to stop that last one, and either increase memory
>>> or move the not started servers into one or more other Linux machines.
>>>
>>>
>>> ___
>>> Tore Agblad
>>> Volvo Information Technology
>>> Infrastructure Mainframe Design & Development, Linux servers
>>> Dept 4352  DA1S
>>> SE-405 08, Gothenburg  Sweden
>>>
>>> Telephone: +46-31-3233569
>>> E-mail: tore.agb...@volvo.com
>>>
>>> http://www.volvo.com/volvoit/global/en-gb/
>>>
>>> -Original Message-
>>> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane 
>>> G
>>> Sent: den 28 juli 2010 02:51
>>> To: LINUX-390@VM.MARIST.EDU
>>> Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?
>>>
>>> Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone.
>>> Smells like it's busy doing its own stuff. If it were me I'd want to know 
>>> trends for things like
>>> swap-in and swap-out rates, tasks in uninterruptible sleep, context switch 
>>> counts.
>>>
>>> SAR is too granular to be any use even if it did have the data. Set up a 
>>> background script to
>>> run top and vmstat and write to disk every so often. A quick bit of awk 
>>> should show the
>>> trend. You could do all the probing of /proc yourself, but I find it easier 
>>> to allow things like
>>> top/ps/vmstat do all the grunt work.
>>>
>>> Shane ...
>>>
>>> --
>>> For LINUX-390 subscribe / signoff / archive access instructions,
>>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
>>> visit
>>> http://www.marist.edu/htbin/wlvindex?LINUX-390
>>> --
>>> For more information on Linux on System z, visit
>>> http://wiki.linuxvm.org/
>>>
>>> --
>>> For LINUX-390 subscribe / signoff / archive access instructions,
>>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
>>> visit
>>> http://www.marist.edu/htbin/wlvindex?LINUX-390
>>> --
>>> For more information on Linux on System z, visit
>>> http://wiki.linuxvm.org/
>>>
>>
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-08-11 Thread Daniel Tate
Has anyone gotten this to successfully work on SLES11?  Other versions
of SLES?  Redhat?

Here is a description from the websphere admin of the problem from his side:



We are running 6.1.0.25 ND on Zlinux (SLES 11). We have 2 nodes created

on a server with 14 application servers per node. All jvm sizes are set

to default 50/256. No applications are installed.

The server has 10GB of real memory and 30GB of swap memory. We start

the node agents up, then start all applications servers. Once

everything is up, approximately 7GB of memory is used.

With the server sitting idle, the memory used by the node agents will

continue to grow to well over 1GB. Memory will be continuously used

until all real memory and swap memory is consumed. At this point the

server becomes unavailable and eventually kills off all java processes

on the machine. No heap dumps have ever been generated.


On Wed, Aug 11, 2010 at 1:27 PM, Daniel Tate  wrote:
> It's the nodeagents that are consuming the memory.  The weird thing is
> RSS and VIRT are both larger than what WAS shows in the console.. and
> VIRT grows even though no swapping was active.
>
> We simply cannot explain the action.. We have PMR's open for every
> possible avenue here.    So i'll ask the question another way:
>
> Does anyone have WebSphere (Network Deployment edition) running
> successfully on s390x, SLES11 GA?
>
>
> On Wed, Jul 28, 2010 at 9:05 AM, Agblad Tore  wrote:
>> and to make it somewhat easier to login :)
>> start just a few servers at boot, then start one at a time until
>> you see problems.
>> Then it's time to stop that last one, and either increase memory
>> or move the not started servers into one or more other Linux machines.
>>
>>
>> ___
>> Tore Agblad
>> Volvo Information Technology
>> Infrastructure Mainframe Design & Development, Linux servers
>> Dept 4352  DA1S
>> SE-405 08, Gothenburg  Sweden
>>
>> Telephone: +46-31-3233569
>> E-mail: tore.agb...@volvo.com
>>
>> http://www.volvo.com/volvoit/global/en-gb/
>>
>> -Original Message-----
>> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane G
>> Sent: den 28 juli 2010 02:51
>> To: LINUX-390@VM.MARIST.EDU
>> Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?
>>
>> Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone.
>> Smells like it's busy doing its own stuff. If it were me I'd want to know 
>> trends for things like
>> swap-in and swap-out rates, tasks in uninterruptible sleep, context switch 
>> counts.
>>
>> SAR is too granular to be any use even if it did have the data. Set up a 
>> background script to
>> run top and vmstat and write to disk every so often. A quick bit of awk 
>> should show the
>> trend. You could do all the probing of /proc yourself, but I find it easier 
>> to allow things like
>> top/ps/vmstat do all the grunt work.
>>
>> Shane ...
>>
>> --
>> For LINUX-390 subscribe / signoff / archive access instructions,
>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
>> visit
>> http://www.marist.edu/htbin/wlvindex?LINUX-390
>> --
>> For more information on Linux on System z, visit
>> http://wiki.linuxvm.org/
>>
>> --
>> For LINUX-390 subscribe / signoff / archive access instructions,
>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or 
>> visit
>> http://www.marist.edu/htbin/wlvindex?LINUX-390
>> --
>> For more information on Linux on System z, visit
>> http://wiki.linuxvm.org/
>>
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-08-11 Thread Daniel Tate
It's the nodeagents that are consuming the memory.  The weird thing is
RSS and VIRT are both larger than what WAS shows in the console.. and
VIRT grows even though no swapping was active.

We simply cannot explain the action.. We have PMR's open for every
possible avenue here.So i'll ask the question another way:

Does anyone have WebSphere (Network Deployment edition) running
successfully on s390x, SLES11 GA?


On Wed, Jul 28, 2010 at 9:05 AM, Agblad Tore  wrote:
> and to make it somewhat easier to login :)
> start just a few servers at boot, then start one at a time until
> you see problems.
> Then it's time to stop that last one, and either increase memory
> or move the not started servers into one or more other Linux machines.
>
>
> ___
> Tore Agblad
> Volvo Information Technology
> Infrastructure Mainframe Design & Development, Linux servers
> Dept 4352  DA1S
> SE-405 08, Gothenburg  Sweden
>
> Telephone: +46-31-3233569
> E-mail: tore.agb...@volvo.com
>
> http://www.volvo.com/volvoit/global/en-gb/
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane G
> Sent: den 28 juli 2010 02:51
> To: LINUX-390@VM.MARIST.EDU
> Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?
>
> Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone.
> Smells like it's busy doing its own stuff. If it were me I'd want to know 
> trends for things like
> swap-in and swap-out rates, tasks in uninterruptible sleep, context switch 
> counts.
>
> SAR is too granular to be any use even if it did have the data. Set up a 
> background script to
> run top and vmstat and write to disk every so often. A quick bit of awk 
> should show the
> trend. You could do all the probing of /proc yourself, but I find it easier 
> to allow things like
> top/ps/vmstat do all the grunt work.
>
> Shane ...
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-28 Thread Agblad Tore
and to make it somewhat easier to login :)
start just a few servers at boot, then start one at a time until
you see problems.
Then it's time to stop that last one, and either increase memory
or move the not started servers into one or more other Linux machines. 


___
Tore Agblad
Volvo Information Technology
Infrastructure Mainframe Design & Development, Linux servers
Dept 4352  DA1S 
SE-405 08, Gothenburg  Sweden

Telephone: +46-31-3233569
E-mail: tore.agb...@volvo.com

http://www.volvo.com/volvoit/global/en-gb/

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Shane G
Sent: den 28 juli 2010 02:51
To: LINUX-390@VM.MARIST.EDU
Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?

Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone.
Smells like it's busy doing its own stuff. If it were me I'd want to know 
trends for things like 
swap-in and swap-out rates, tasks in uninterruptible sleep, context switch 
counts.

SAR is too granular to be any use even if it did have the data. Set up a 
background script to 
run top and vmstat and write to disk every so often. A quick bit of awk should 
show the 
trend. You could do all the probing of /proc yourself, but I find it easier to 
allow things like 
top/ps/vmstat do all the grunt work.

Shane ... 

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-28 Thread Rodger Donaldson

Daniel Tate wrote:

Unfortunately it did not.  It continued to grow and grow until oom
killed in a loop.  We're supposed to have a max size of 512MB per VM.


A max heap of 512 MB per VM doesn't mean that each VM will only use
512M, it means that it will grow to 512M + any extra memory required by
the JVM itself.  This can actually be quite large; I have JVMs that are
half again as big as their heap in some cases[1].


Last time i did any WAS stuff was on Linux but on version 3 when it
was new.. a long time ago.

top - 14:04:46 up  2:09,  5 users,  load average: 60.09, 18.03, 6.96


  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 4520 wasadmin  20   0 1178m 576m 4460 S  107  7.2  11:59.54 java
 8943 wasadmin  20   0  566m 254m 4372 S   15  3.2   2:11.86 java


[snip]

If you're going to use top to track down memory problems, please at
least (a) sort by RES, which is what you care about, and (b) make sure
top is recording all of the processes using significant memory.  Your
top display only shows a fraction of the 100+ processes, and is sorted
by CPU use.  Bit hard to guess what your culprits are from that.

Something like

ps -eao rss,pid,cmd | sort -n

...is probably more useful in determining where all your memory is going.

It would be interesting to understand why one of your processes is
running so hard on the CPU when the others are more modest.  It may just
be the workload, or it may be that trimming your heaps down has left it
GC-thrashing, which will make your situation even worse.

The fact you're deep in swap with Java processes, though, is going to
destroy your performance.  There's no way around that.  Swap to VDISK,
swap to physical disk, it doesn't matter.  JVM + zLinux + zVM + Linux
swapping is horrid for performance (zVM paging isn't much better).  I
have a couple of years of stress and production data that says so.

[1] And yes, I mean real memory utilisation, not code segments and suchlike.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-28 Thread Agblad Tore
You may have a memory leak in one or more of the java apps ? 


___
Tore Agblad
Volvo Information Technology
Infrastructure Mainframe Design & Development, Linux servers
Dept 4352  DA1S 
SE-405 08, Gothenburg  Sweden

Telephone: +46-31-3233569
E-mail: tore.agb...@volvo.com

http://www.volvo.com/volvoit/global/en-gb/

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: den 27 juli 2010 21:30
To: LINUX-390@VM.MARIST.EDU
Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?

Unfortunately it did not.  It continued to grow and grow until oom
killed in a loop.  We're supposed to have a max size of 512MB per VM.
Last time i did any WAS stuff was on Linux but on version 3 when it
was new.. a long time ago.

top - 14:04:46 up  2:09,  5 users,  load average: 60.09, 18.03, 6.96

Tasks: 144 total,   2 running, 142 sleeping,   0 stopped,   0 zombie

Cpu(s): 30.9%us, 67.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.1%si,  1.8%st

Mem:   8220492k total,  8183456k used,37036k free,  680k buffers

Swap:   779880k total,   779880k used,0k free,54288k cached



  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 4520 wasadmin  20   0 1178m 576m 4460 S  107  7.2  11:59.54 java

 8943 wasadmin  20   0  566m 254m 4372 S   15  3.2   2:11.86 java

10326 wasadmin  20   0  581m 220m 4224 S   15  2.7   2:01.12 java

 9678 wasadmin  20   0  563m 297m  20m S   15  3.7   2:11.54 java

 4368 wasadmin  20   0 1148m 603m 2848 S   13  7.5  11:55.56 java

 9321 wasadmin  20   0  597m 262m 4720 S   10  3.3   2:15.96 java

11107 wasadmin  20   0  555m 272m 4048 S   10  3.4   2:02.47 java

10706 wasadmin  20   0  509m 241m 2720 S9  3.0   2:05.35 java

 4920 wasadmin  20   0  540m 228m 3296 S8  2.8   2:20.82 java

 5298 wasadmin  20   0  586m 253m 2792 S8  3.2   2:06.52 java

 7794 wasadmin  20   0  594m 307m 4644 S7  3.8   2:06.89 java

 6021 wasadmin  20   0  542m 208m 2908 S5  2.6   1:53.18 java

 6205 wasadmin  20   0  562m 226m 2980 S5  2.8   1:57.61 java

 7265 wasadmin  20   0  517m 218m 2888 S4  2.7   1:51.77 java

 9992 wasadmin  20   0  624m 223m 2896 S4  2.8   2:07.83 java

 5481 wasadmin  20   0  707m 254m 4516 S3  3.2   2:28.43 java

 6390 wasadmin  20   0  509m 167m 3672 S3  2.1   2:13.46 java

 7516 wasadmin  20   0  531m 272m 2272 S3  3.4   2:05.57 java

11502 wasadmin  20   0  483m 207m 2772 S3  2.6   1:17.93 java

12248 wasadmin  20   0  520m 203m 3544 S3  2.5   1:22.76 java

 6805 wasadmin  20   0  577m 254m 2980 S3  3.2   2:08.65 java

 5657 wasadmin  20   0  572m 262m  22m S1  3.3   2:12.52 java

 8615 wasadmin  20   0  574m 255m 2792 S1  3.2   2:12.48 java

 5068 wasadmin  20   0  580m 243m 2560 S0  3.0   2:11.30 java

 6586 wasadmin  20   0  580m 220m 3112 S0  2.8   2:08.78 java

 7036 wasadmin  20   0  664m 244m 2504 S0  3.0   2:02.78 java

 8066 wasadmin  20   0  578m 233m 3128 S0  2.9   1:58.66 java

11924 wasadmin  20   0  542m 203m 4780 S0  2.5   1:21.65 java

 4052 wasadmin  20   0  9968 2624 1188 S0  0.0   0:00.15 bash

 5008 wasadmin  20   0  6900 1748 1028 S0  0.0   0:23.92 top

 5181 wasadmin  20   0  9972 2724 1192 S0  0.0   0:00.22 bash

 5839 wasadmin  20   0  696m 217m 2716 S0  2.7   1:54.87 java

 8347 wasadmin  20   0  528m 242m  22m S0  3.0   1:55.78 java





On Tue, Jul 27, 2010 at 10:26 AM, Marcy Cortes
 wrote:
> WAS calls non-heap memory "native".
> That's what you seem to be using up.
> Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312
> And if these things don't help, open a PMR with WAS support.
> Did it stabilize at 18G?
> The panic_on_oom is a good idea - you can then get a dump (set up a linux 
> dump volume if you are in a hurry - the VMDUMP command takes all day :)
>
>
> Marcy
>
> "This message may contain confidential and/or privileged information. If you 
> are not the addressee or authorized to receive this for the addressee, you 
> must not use, copy, disclose, or take any action based on this message or any 
> information herein. If you have received this message in error, please advise 
> the sender immediately by reply e-mail and delete this message. Thank you for 
> your cooperation."
>
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
> Tate
> Sent: Tuesday, July 27, 2010 7:53 AM
> To: LINUX-390@vm.marist.edu
> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning 
> problems?
>
> Yes, but no memory ever gets freed.  The system stops responding and
> is too busy dumping information to the console screen to respond to
> anything at the console or network.
>
> And a system-wide panic would provide me 

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-28 Thread Agblad Tore
I think you don't need to set swappines to 0.
It might be enough to have swap space smaller than heap size,
making Linux very unwilling to swap that.
I think yhey use this trick here in some servers. 


___
Tore Agblad
Volvo Information Technology
Infrastructure Mainframe Design & Development, Linux servers
Dept 4352  DA1S 
SE-405 08, Gothenburg  Sweden

Telephone: +46-31-3233569
E-mail: tore.agb...@volvo.com

http://www.volvo.com/volvoit/global/en-gb/

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: den 27 juli 2010 16:53
To: LINUX-390@VM.MARIST.EDU
Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?

Yes, but no memory ever gets freed.  The system stops responding and
is too busy dumping information to the console screen to respond to
anything at the console or network.

And a system-wide panic would provide me with a core file to analyze
and send off; though the progress we've made thanks to this group has
helped a bit (swappiness = 0) has helped it remain stable longer..
right now we're using 18GB of swap instead of the 35 we were before.
i haven't tried panic_on_oom, but thats a good idea and i was
braindead not to think of it.

On Mon, Jul 26, 2010 at 8:48 PM, Shane G  wrote:
> Some more info please.
> ... you get a OOM condition ?.
> ... the/a large consumer gets killed ?
> ... the system "halts" (explain) ?.
>
> You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to 
> "1" will
> have the desired effect on non zLinux.
>
> Shane ...
>
> On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate  wrote:
>
>> When websphere starts, it consumes all the memory eventually and
>> halts, but not panics, the system.
>  ...
>> At this point i see two problems:
>>
>> 1) Why is OOM Kill not functioning properly
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-28 Thread Agblad Tore
I don't work directly with WAS here, but talk to the guys daily.
I would check the heap size contra gc runs and the time that takes.
It migh be much better to split into several smaller linux was servers with
smaller memory, heap size and shorter time for gc runs.

We have several WAS servers, and most of them has less than 2 GB.
Usually we start at 1 or 1.2 GB and increase if needed, swap is
two 64 MB vdisk and one physical 380 MB.
Tuning heap size is critical for resp. time.
With more WAS servers with smaller heap size, garbage collection runs
will be much shorter, making total resp time better.

It very much depends on the java code as well of course.

/Tore 


___
Tore Agblad
Volvo Information Technology
Infrastructure Mainframe Design & Development, Linux servers
Dept 4352  DA1S 
SE-405 08, Gothenburg  Sweden

Telephone: +46-31-3233569
E-mail: tore.agb...@volvo.com

http://www.volvo.com/volvoit/global/en-gb/

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: den 27 juli 2010 00:57
To: LINUX-390@VM.MARIST.EDU
Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?

We set swappiness to 0 and started all  servers.  we will see if that helps
and I will look into the sp level upgrade. Does anyone have any good tweeks
for z websphere by chance?

On Jul 26, 2010 4:59 PM, "Marcy Cortes" 
wrote:
> Thanks for that clarification!
>
>
> Marcy
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark
Post
> Sent: Monday, July 26, 2010 2:21 PM
> To: LINUX-390@vm.marist.edu
> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning
problems?
>
>>>> On 7/26/2010 at 05:05 PM, Marcy Cortes 
wrote:
>> I was going to suggest a dump and a ticket to Novell, but it looks like
you
>> aren't SP1, and so are unsupported.
>
> SLES11 GA is fully supported until 6 months after SP1 went GA. Even after
that, NTS supports the product line throughout its life. We just can't get
Level 3 to look at a problem after 6 months, which means bug fixes won't be
created for a customer unless it's against SP1.
>
>
> Mark Post
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-27 Thread Shane G
Hmmm - high "sys" CPU usage, high loadavg, system not talking to anyone.
Smells like it's busy doing its own stuff. If it were me I'd want to know 
trends for things like 
swap-in and swap-out rates, tasks in uninterruptible sleep, context switch 
counts.

SAR is too granular to be any use even if it did have the data. Set up a 
background script to 
run top and vmstat and write to disk every so often. A quick bit of awk should 
show the 
trend. You could do all the probing of /proc yourself, but I find it easier to 
allow things like 
top/ps/vmstat do all the grunt work.

Shane ... 

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-27 Thread Marcy Cortes
The memory sizes here look different than you posted before.
You have 8G of memory here (well 8220492k)  but only 780M of swap.
Your WAS heap sizes totaled around 7G if I recall (28 JVMs x 256M ).  
You can see what the running heap size is with "ps -ef | grep Xmx" and look for 
the value on the Xmx (i.e. -Xmx512m )
So, yeah, this time the OOM condition is expected.



Marcy
"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: Tuesday, July 27, 2010 12:30 PM
To: LINUX-390@vm.marist.edu
Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

Unfortunately it did not.  It continued to grow and grow until oom
killed in a loop.  We're supposed to have a max size of 512MB per VM.
Last time i did any WAS stuff was on Linux but on version 3 when it
was new.. a long time ago.

top - 14:04:46 up  2:09,  5 users,  load average: 60.09, 18.03, 6.96

Tasks: 144 total,   2 running, 142 sleeping,   0 stopped,   0 zombie

Cpu(s): 30.9%us, 67.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.1%si,  1.8%st

Mem:   8220492k total,  8183456k used,37036k free,  680k buffers

Swap:   779880k total,   779880k used,0k free,54288k cached



  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 4520 wasadmin  20   0 1178m 576m 4460 S  107  7.2  11:59.54 java

 8943 wasadmin  20   0  566m 254m 4372 S   15  3.2   2:11.86 java

10326 wasadmin  20   0  581m 220m 4224 S   15  2.7   2:01.12 java

 9678 wasadmin  20   0  563m 297m  20m S   15  3.7   2:11.54 java

 4368 wasadmin  20   0 1148m 603m 2848 S   13  7.5  11:55.56 java

 9321 wasadmin  20   0  597m 262m 4720 S   10  3.3   2:15.96 java

11107 wasadmin  20   0  555m 272m 4048 S   10  3.4   2:02.47 java

10706 wasadmin  20   0  509m 241m 2720 S9  3.0   2:05.35 java

 4920 wasadmin  20   0  540m 228m 3296 S8  2.8   2:20.82 java

 5298 wasadmin  20   0  586m 253m 2792 S8  3.2   2:06.52 java

 7794 wasadmin  20   0  594m 307m 4644 S7  3.8   2:06.89 java

 6021 wasadmin  20   0  542m 208m 2908 S5  2.6   1:53.18 java

 6205 wasadmin  20   0  562m 226m 2980 S5  2.8   1:57.61 java

 7265 wasadmin  20   0  517m 218m 2888 S4  2.7   1:51.77 java

 9992 wasadmin  20   0  624m 223m 2896 S4  2.8   2:07.83 java

 5481 wasadmin  20   0  707m 254m 4516 S3  3.2   2:28.43 java

 6390 wasadmin  20   0  509m 167m 3672 S3  2.1   2:13.46 java

 7516 wasadmin  20   0  531m 272m 2272 S3  3.4   2:05.57 java

11502 wasadmin  20   0  483m 207m 2772 S3  2.6   1:17.93 java

12248 wasadmin  20   0  520m 203m 3544 S3  2.5   1:22.76 java

 6805 wasadmin  20   0  577m 254m 2980 S3  3.2   2:08.65 java

 5657 wasadmin  20   0  572m 262m  22m S1  3.3   2:12.52 java

 8615 wasadmin  20   0  574m 255m 2792 S1  3.2   2:12.48 java

 5068 wasadmin  20   0  580m 243m 2560 S0  3.0   2:11.30 java

 6586 wasadmin  20   0  580m 220m 3112 S0  2.8   2:08.78 java

 7036 wasadmin  20   0  664m 244m 2504 S0  3.0   2:02.78 java

 8066 wasadmin  20   0  578m 233m 3128 S0  2.9   1:58.66 java

11924 wasadmin  20   0  542m 203m 4780 S0  2.5   1:21.65 java

 4052 wasadmin  20   0  9968 2624 1188 S0  0.0   0:00.15 bash

 5008 wasadmin  20   0  6900 1748 1028 S0  0.0   0:23.92 top

 5181 wasadmin  20   0  9972 2724 1192 S0  0.0   0:00.22 bash

 5839 wasadmin  20   0  696m 217m 2716 S0  2.7   1:54.87 java

 8347 wasadmin  20   0  528m 242m  22m S0  3.0   1:55.78 java





On Tue, Jul 27, 2010 at 10:26 AM, Marcy Cortes
 wrote:
> WAS calls non-heap memory "native".
> That's what you seem to be using up.
> Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312
> And if these things don't help, open a PMR with WAS support.
> Did it stabilize at 18G?
> The panic_on_oom is a good idea - you can then get a dump (set up a linux 
> dump volume if you are in a hurry - the VMDUMP command takes all day :)
>
>
> Marcy
>
> "This message may contain confidential and/or privileged information. If you 
> are not the addressee or authorized to receive this for the addressee, you 
> must not use, copy, disclose, or take any action based on this message or any 
> information herein. If you have received this message in error, please advise 
> the sender immediately by reply e-mail and delete this message. Thank you for 
> your cooperation."
>
>
> -Original Message-
> From: Linux on 390 Po

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-27 Thread Daniel Tate
Unfortunately it did not.  It continued to grow and grow until oom
killed in a loop.  We're supposed to have a max size of 512MB per VM.
Last time i did any WAS stuff was on Linux but on version 3 when it
was new.. a long time ago.

top - 14:04:46 up  2:09,  5 users,  load average: 60.09, 18.03, 6.96

Tasks: 144 total,   2 running, 142 sleeping,   0 stopped,   0 zombie

Cpu(s): 30.9%us, 67.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.1%si,  1.8%st

Mem:   8220492k total,  8183456k used,37036k free,  680k buffers

Swap:   779880k total,   779880k used,0k free,54288k cached



  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 4520 wasadmin  20   0 1178m 576m 4460 S  107  7.2  11:59.54 java

 8943 wasadmin  20   0  566m 254m 4372 S   15  3.2   2:11.86 java

10326 wasadmin  20   0  581m 220m 4224 S   15  2.7   2:01.12 java

 9678 wasadmin  20   0  563m 297m  20m S   15  3.7   2:11.54 java

 4368 wasadmin  20   0 1148m 603m 2848 S   13  7.5  11:55.56 java

 9321 wasadmin  20   0  597m 262m 4720 S   10  3.3   2:15.96 java

11107 wasadmin  20   0  555m 272m 4048 S   10  3.4   2:02.47 java

10706 wasadmin  20   0  509m 241m 2720 S9  3.0   2:05.35 java

 4920 wasadmin  20   0  540m 228m 3296 S8  2.8   2:20.82 java

 5298 wasadmin  20   0  586m 253m 2792 S8  3.2   2:06.52 java

 7794 wasadmin  20   0  594m 307m 4644 S7  3.8   2:06.89 java

 6021 wasadmin  20   0  542m 208m 2908 S5  2.6   1:53.18 java

 6205 wasadmin  20   0  562m 226m 2980 S5  2.8   1:57.61 java

 7265 wasadmin  20   0  517m 218m 2888 S4  2.7   1:51.77 java

 9992 wasadmin  20   0  624m 223m 2896 S4  2.8   2:07.83 java

 5481 wasadmin  20   0  707m 254m 4516 S3  3.2   2:28.43 java

 6390 wasadmin  20   0  509m 167m 3672 S3  2.1   2:13.46 java

 7516 wasadmin  20   0  531m 272m 2272 S3  3.4   2:05.57 java

11502 wasadmin  20   0  483m 207m 2772 S3  2.6   1:17.93 java

12248 wasadmin  20   0  520m 203m 3544 S3  2.5   1:22.76 java

 6805 wasadmin  20   0  577m 254m 2980 S3  3.2   2:08.65 java

 5657 wasadmin  20   0  572m 262m  22m S1  3.3   2:12.52 java

 8615 wasadmin  20   0  574m 255m 2792 S1  3.2   2:12.48 java

 5068 wasadmin  20   0  580m 243m 2560 S0  3.0   2:11.30 java

 6586 wasadmin  20   0  580m 220m 3112 S0  2.8   2:08.78 java

 7036 wasadmin  20   0  664m 244m 2504 S0  3.0   2:02.78 java

 8066 wasadmin  20   0  578m 233m 3128 S0  2.9   1:58.66 java

11924 wasadmin  20   0  542m 203m 4780 S0  2.5   1:21.65 java

 4052 wasadmin  20   0  9968 2624 1188 S0  0.0   0:00.15 bash

 5008 wasadmin  20   0  6900 1748 1028 S0  0.0   0:23.92 top

 5181 wasadmin  20   0  9972 2724 1192 S0  0.0   0:00.22 bash

 5839 wasadmin  20   0  696m 217m 2716 S0  2.7   1:54.87 java

 8347 wasadmin  20   0  528m 242m  22m S0  3.0   1:55.78 java





On Tue, Jul 27, 2010 at 10:26 AM, Marcy Cortes
 wrote:
> WAS calls non-heap memory "native".
> That's what you seem to be using up.
> Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312
> And if these things don't help, open a PMR with WAS support.
> Did it stabilize at 18G?
> The panic_on_oom is a good idea - you can then get a dump (set up a linux 
> dump volume if you are in a hurry - the VMDUMP command takes all day :)
>
>
> Marcy
>
> "This message may contain confidential and/or privileged information. If you 
> are not the addressee or authorized to receive this for the addressee, you 
> must not use, copy, disclose, or take any action based on this message or any 
> information herein. If you have received this message in error, please advise 
> the sender immediately by reply e-mail and delete this message. Thank you for 
> your cooperation."
>
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
> Tate
> Sent: Tuesday, July 27, 2010 7:53 AM
> To: LINUX-390@vm.marist.edu
> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning 
> problems?
>
> Yes, but no memory ever gets freed.  The system stops responding and
> is too busy dumping information to the console screen to respond to
> anything at the console or network.
>
> And a system-wide panic would provide me with a core file to analyze
> and send off; though the progress we've made thanks to this group has
> helped a bit (swappiness = 0) has helped it remain stable longer..
> right now we're using 18GB of swap instead of the 35 we were before.
> i haven't tried panic_on_oom, but thats a good idea and i was
> braindead not to think of it.
>
> On Mon, Jul 26, 2010 at 8:48 PM, Shane G  wrote:
>> Some more info please.
>> ... you get a OOM condition ?.
>> ... the/a large consumer gets killed ?
>> ... the system "halts" (expla

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-27 Thread Marcy Cortes
WAS calls non-heap memory "native".  
That's what you seem to be using up.
Look at this http://www-01.ibm.com/support/docview.wss?uid=swg21373312
And if these things don't help, open a PMR with WAS support.
Did it stabilize at 18G?
The panic_on_oom is a good idea - you can then get a dump (set up a linux dump 
volume if you are in a hurry - the VMDUMP command takes all day :)
 

Marcy 

"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: Tuesday, July 27, 2010 7:53 AM
To: LINUX-390@vm.marist.edu
Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

Yes, but no memory ever gets freed.  The system stops responding and
is too busy dumping information to the console screen to respond to
anything at the console or network.

And a system-wide panic would provide me with a core file to analyze
and send off; though the progress we've made thanks to this group has
helped a bit (swappiness = 0) has helped it remain stable longer..
right now we're using 18GB of swap instead of the 35 we were before.
i haven't tried panic_on_oom, but thats a good idea and i was
braindead not to think of it.

On Mon, Jul 26, 2010 at 8:48 PM, Shane G  wrote:
> Some more info please.
> ... you get a OOM condition ?.
> ... the/a large consumer gets killed ?
> ... the system "halts" (explain) ?.
>
> You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to 
> "1" will
> have the desired effect on non zLinux.
>
> Shane ...
>
> On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate  wrote:
>
>> When websphere starts, it consumes all the memory eventually and
>> halts, but not panics, the system.
>  ...
>> At this point i see two problems:
>>
>> 1) Why is OOM Kill not functioning properly
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-27 Thread Daniel Tate
Yes, but no memory ever gets freed.  The system stops responding and
is too busy dumping information to the console screen to respond to
anything at the console or network.

And a system-wide panic would provide me with a core file to analyze
and send off; though the progress we've made thanks to this group has
helped a bit (swappiness = 0) has helped it remain stable longer..
right now we're using 18GB of swap instead of the 35 we were before.
i haven't tried panic_on_oom, but thats a good idea and i was
braindead not to think of it.

On Mon, Jul 26, 2010 at 8:48 PM, Shane G  wrote:
> Some more info please.
> ... you get a OOM condition ?.
> ... the/a large consumer gets killed ?
> ... the system "halts" (explain) ?.
>
> You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to 
> "1" will
> have the desired effect on non zLinux.
>
> Shane ...
>
> On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate  wrote:
>
>> When websphere starts, it consumes all the memory eventually and
>> halts, but not panics, the system.
>  ...
>> At this point i see two problems:
>>
>> 1) Why is OOM Kill not functioning properly
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-27 Thread rodgerd
On Mon, 26 Jul 2010 10:24:06 -0500, Daniel Tate 
wrote:
> At this point i see two problems:
> 
> 1) Why is OOM Kill not functioning properly
> 2) Why is websphere performance so awful?

If you run WAS (or any Java app) in swap, performance will be terrible, as
a rule of thumb; if you've got 16GB allocated to your JVM heaps (for
example), and 10GB for the guest, you're going to see terrible performance
sooner or later.  

(A secondary issue is that Z9 processors are, in my experience, much
slower than a modern x86 processor, so if you're expecting a Z9 IFL to
offer comparable performance to a Xeon, you are going to be very sadly
disappointed.)

As far as OOM goes I've seen this situation on zVM guests, usually with
Oracle, in the past, and put it down to the VDISK swap being just fast
enough that the OOM killer is never triggered, even though the system is
unusable.  That's just theory on my part, though.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Shane G
Some more info please.
... you get a OOM condition ?.
... the/a large consumer gets killed ?
... the system "halts" (explain) ?.

You *want* a system-wide panic ?. If so, setting /proc/sys/vm/panic_on_oom to 
"1" will 
have the desired effect on non zLinux.

Shane ...

On Tue, Jul 27th, 2010 at 1:24 AM, Daniel Tate  wrote:

> When websphere starts, it consumes all the memory eventually and
> halts, but not panics, the system.
 ...
> At this point i see two problems:
> 
> 1) Why is OOM Kill not functioning properly

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Daniel Tate
 : slabdata  0  0  0
size-2048170170   204821 : tunables   24   12
  8 : slabdata 85 85  0
size-1024(DMA)19 20   102441 : tunables   54   27
  8 : slabdata  5  5  0
size-1024893928   102441 : tunables   54   27
  8 : slabdata232232  0
size-512(DMA) 18 3251281 : tunables   54   27
  8 : slabdata  4  4  0
size-512 57657651281 : tunables   54   27
  8 : slabdata 72 72  0
size-256(DMA)  3 15256   151 : tunables  120   60
  8 : slabdata  1  1  0
size-256   16444  16455256   151 : tunables  120   60
  8 : slabdata   1097   1097  0
size-128(DMA)  0  0128   301 : tunables  120   60
  8 : slabdata  0  0  0
size-64(DMA)  23 59 64   591 : tunables  120   60
  8 : slabdata  1  1  0
size-64 3532   6608 64   591 : tunables  120   60
  8 : slabdata112112  0
size-32(DMA)   4112 32  1121 : tunables  120   60
  8 : slabdata  1  1  0
size-128 942   1110128   301 : tunables  120   60
  8 : slabdata 37 37  0
size-32 3261   3360 32  1121 : tunables  120   60
  8 : slabdata 30 30  0
kmem_cache   15315576851 : tunables   54   27
  8 : slabdata 31 31  0

On Mon, Jul 26, 2010 at 6:17 PM, Marcy Cortes
 wrote:
> Do you run at all before you run out of memory?  Have you tried just starting 
> one of the apps at a time and seeing what each seems to do to memory usage?  
> Could it be that one of them has a leak?
>
> I think all the tweaks are really app dependent.  You'll have to measure with 
> both a good VM performance monitor and a good java monitor.
> Our largest app finds gencon garbage collection much more efficient (uses 
> more than 10% less CPU for them).  Another found that using Async i/o in WAS 
> exhibited native memory leak symptoms...  But other than that, we don't have 
> specific tweeks that we do for every app.
>
>
> Marcy
> "This message may contain confidential and/or privileged information. If you 
> are not the addressee or authorized to receive this for the addressee, you 
> must not use, copy, disclose, or take any action based on this message or any 
> information herein. If you have received this message in error, please advise 
> the sender immediately by reply e-mail and delete this message. Thank you for 
> your cooperation."
>
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
> Tate
> Sent: Monday, July 26, 2010 3:57 PM
> To: LINUX-390@vm.marist.edu
> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning 
> problems?
>
> We set swappiness to 0 and started all  servers.  we will see if that helps
> and I will look into the sp level upgrade. Does anyone have any good tweeks
> for z websphere by chance?
>
> On Jul 26, 2010 4:59 PM, "Marcy Cortes" 
> wrote:
>> Thanks for that clarification!
>>
>>
>> Marcy
>>
>> -Original Message-
>> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark
> Post
>> Sent: Monday, July 26, 2010 2:21 PM
>> To: LINUX-390@vm.marist.edu
>> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning
> problems?
>>
>>>>> On 7/26/2010 at 05:05 PM, Marcy Cortes 
> wrote:
>>> I was going to suggest a dump and a ticket to Novell, but it looks like
> you
>>> aren't SP1, and so are unsupported.
>>
>> SLES11 GA is fully supported until 6 months after SP1 went GA. Even after
> that, NTS supports the product line throughout its life. We just can't get
> Level 3 to look at a problem after 6 months, which means bug fixes won't be
> created for a customer unless it's against SP1.
>>
>>
>> Mark Post
>>
>> --
>> For LINUX-390 subscribe / signoff / archive access instructions,
>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
> visit
>> http://www.marist.edu/htbin/wlvindex?LINUX-390
>> --
>> For more information on Linux on System z, visit
>> http://wiki.linuxvm.org/
>>
>> --
>> For LINUX-390 subscribe / signoff / archive access instructions,
>> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Phil Tully
Hello I see you are you running 64bit WAS on Linux, are you running the 
32bit version on windows?


How close is the memory utilization on the windows machine?

regards
Phil Tully

On 7/26/2010 12:07 PM, Marcy Cortes wrote:

First of all, you've run out of memory on that server (Swap: 35764956k total, 
35764956k used,)
It ate all of the 10G and all of the 35G of swap.
How many JVM's are running and what are their min/max heap sizes?



Marcy

“This message may contain confidential and/or privileged information. If you are not 
the addressee or authorized to receive this for the addressee, you must not use, 
copy, disclose, or take any action based on this message or any information herein. 
If you have received this message in error, please advise the sender immediately by 
reply e-mail and delete this message. Thank you for your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: Monday, July 26, 2010 8:24 AM
To: LINUX-390@vm.marist.edu
Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

We're running websphere on a z9 under z/VM 4 systems are live out of 8.   it
is running apps that consume around 16GB of memory on a Windows machine.  on
this, we have allocated 10G of real storage (RAM) and around 35GB of
Swap.When websphere starts, it consumes all the memory eventually and
halts, but not panics, the system.We are running 64-Bit.  I'm a z/VM
novice so i don't know much to do..

Here is some information from our WAS Admin:
"We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0
installed.  There are two nodes running 14 application servers each. there
are currently 32 applications installed but not currently running.  No
security has been enabled for WebSphere at this time."


At this point i see two problems:

1) Why is OOM Kill not functioning properly
2) Why is websphere performance so awful?

and have two questions

1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on
z/VM?  So far we've been using dated case studies and redbooks that seem to
be filled with inaccuracies or outdated information.
2) Is there any way to force a coredump via the cp, like you can with the
magic sysrq?

All systems are running the same release and patch level:

[root] bwzld001:~# lsb_release -a
LSB Version:
core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x
Distributor ID:SUSE LINUX
Description:SUSE Linux Enterprise Server 11 (s390x)
Release:11
Codename:n/a


Here is a partial top shortly before system death:

top - 08:13:14 up 2 days, 16:08,  2 users,  load average: 51.47, 22.20,
10.25
Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si,
1.2%st
Mem:  10268344k total, 10220568k used,47776k free,  548k buffers
Swap: 35764956k total, 35764956k used,0k free,56340k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND

26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28
java
29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13
java
24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14
java
24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52
java
26606 wasadmin  20   0 1438m 272m 6212 S   12  2.7  16:02.77
java
27600 wasadmin  20   0 1553m 258m 2920 S   12  2.6  15:46.57
java
24638 wasadmin  20   0 7368m 1.3g  24m S   10 13.7 206:02.05
java
25609 wasadmin  20   0 1528m 219m 2540 S9  2.2  16:07.33
java
30258 wasadmin  20   0 1515m 249m 2592 S7  2.5  15:49.79
java
25780 wasadmin  20   0 1604m 277m 2332 S6  2.8  16:31.41
java
27106 wasadmin  20   0 1458m 273m 2472 S6  2.7  15:59.13
java
27336 wasadmin  20   0 1528m 238m 2540 S5  2.4  15:38.82
java
29164 wasadmin  20   0 1527m 224m 2608 S5  2.2  16:02.56
java
31400 wasadmin  20   0 1509m 259m 2468 S5  2.6  15:26.38
java
25244 wasadmin  20   0 1509m 290m 2624 S5  2.9  16:16.07
java
24769 wasadmin  20   0 1409m 259m 2308 S5  2.6  16:08.12
java
28796 wasadmin  20   0 1338m 263m 3076 S4  2.6  15:47.72
java
26185 wasadmin  20   0 1493m 274m 2304 S2  2.7  16:01.97
java
25968 wasadmin  20   0 1427m 257m 2532 S1  2.6  15:51.50
java
29495 wasadmin  20   0 1466m 259m 2260 S1  2.6  15:31.82
java
25080 wasadmin  20   0 1445m 236m 2472 S0  2.4  15:53.19
java
26410 wasadmin  20   0 1475m 271m 2540 S0  2.7  15:52.48
java
31027 wasadmin  20   0 1413m 238m 2492 S0  2.4  15:29.78
java
  3695 wasadmin  20   0  9968 1352 1352 S0  0.0   0:00.13
bash
24474 wasadmin  20   0 1468m 205m 2472 S0  2.0  16

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Marcy Cortes
Do you run at all before you run out of memory?  Have you tried just starting 
one of the apps at a time and seeing what each seems to do to memory usage?  
Could it be that one of them has a leak?   

I think all the tweaks are really app dependent.  You'll have to measure with 
both a good VM performance monitor and a good java monitor.  
Our largest app finds gencon garbage collection much more efficient (uses more 
than 10% less CPU for them).  Another found that using Async i/o in WAS 
exhibited native memory leak symptoms...  But other than that, we don't have 
specific tweeks that we do for every app.


Marcy 
"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: Monday, July 26, 2010 3:57 PM
To: LINUX-390@vm.marist.edu
Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

We set swappiness to 0 and started all  servers.  we will see if that helps
and I will look into the sp level upgrade. Does anyone have any good tweeks
for z websphere by chance?

On Jul 26, 2010 4:59 PM, "Marcy Cortes" 
wrote:
> Thanks for that clarification!
>
>
> Marcy
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark
Post
> Sent: Monday, July 26, 2010 2:21 PM
> To: LINUX-390@vm.marist.edu
> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning
problems?
>
>>>> On 7/26/2010 at 05:05 PM, Marcy Cortes 
wrote:
>> I was going to suggest a dump and a ticket to Novell, but it looks like
you
>> aren't SP1, and so are unsupported.
>
> SLES11 GA is fully supported until 6 months after SP1 went GA. Even after
that, NTS supports the product line throughout its life. We just can't get
Level 3 to look at a problem after 6 months, which means bug fixes won't be
created for a customer unless it's against SP1.
>
>
> Mark Post
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Daniel Tate
We set swappiness to 0 and started all  servers.  we will see if that helps
and I will look into the sp level upgrade. Does anyone have any good tweeks
for z websphere by chance?

On Jul 26, 2010 4:59 PM, "Marcy Cortes" 
wrote:
> Thanks for that clarification!
>
>
> Marcy
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark
Post
> Sent: Monday, July 26, 2010 2:21 PM
> To: LINUX-390@vm.marist.edu
> Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning
problems?
>
>>>> On 7/26/2010 at 05:05 PM, Marcy Cortes 
wrote:
>> I was going to suggest a dump and a ticket to Novell, but it looks like
you
>> aren't SP1, and so are unsupported.
>
> SLES11 GA is fully supported until 6 months after SP1 went GA. Even after
that, NTS supports the product line throughout its life. We just can't get
Level 3 to look at a problem after 6 months, which means bug fixes won't be
created for a customer unless it's against SP1.
>
>
> Mark Post
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>
> --
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> --
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Marcy Cortes
Thanks for that clarification! 


Marcy 

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Mark Post
Sent: Monday, July 26, 2010 2:21 PM
To: LINUX-390@vm.marist.edu
Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

>>> On 7/26/2010 at 05:05 PM, Marcy Cortes  
>>> wrote: 
> I was going to suggest a dump and a ticket to Novell, but it looks like you 
> aren't SP1, and so are unsupported.

SLES11 GA is fully supported until 6 months after SP1 went GA.  Even after 
that, NTS supports the product line throughout its life.  We just can't get 
Level 3 to look at a problem after 6 months, which means bug fixes won't be 
created for a customer unless it's against SP1.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Mark Post
>>> On 7/26/2010 at 05:05 PM, Marcy Cortes  
>>> wrote: 
> I was going to suggest a dump and a ticket to Novell, but it looks like you 
> aren't SP1, and so are unsupported.

SLES11 GA is fully supported until 6 months after SP1 went GA.  Even after 
that, NTS supports the product line throughout its life.  We just can't get 
Level 3 to look at a problem after 6 months, which means bug fixes won't be 
created for a customer unless it's against SP1.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Marcy Cortes

I was going to suggest a dump and a ticket to Novell, but it looks like you 
aren't SP1, and so are unsupported.
Anyway you could apply that and try again?


Marcy

“This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: Monday, July 26, 2010 11:28 AM
To: LINUX-390@vm.marist.edu
Subject: Re: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

Yeah, i saw that.. problem is these same apps run on 16GB of mem on a
windows box..

We have 28 JVMs and sizes are set to 50/256.

On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes <
marcy.d.cor...@wellsfargo.com> wrote:

> First of all, you've run out of memory on that server (Swap: 35764956k
> total, 35764956k used,)
> It ate all of the 10G and all of the 35G of swap.
> How many JVM's are running and what are their min/max heap sizes?
>
>
>
> Marcy
>
> “This message may contain confidential and/or privileged information. If
> you are not the addressee or authorized to receive this for the addressee,
> you must not use, copy, disclose, or take any action based on this message
> or any information herein. If you have received this message in error,
> please advise the sender immediately by reply e-mail and delete this
> message. Thank you for your cooperation."
>
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of
> Daniel Tate
> Sent: Monday, July 26, 2010 8:24 AM
> To: LINUX-390@vm.marist.edu
> Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?
>
> We're running websphere on a z9 under z/VM 4 systems are live out of 8.
> it
> is running apps that consume around 16GB of memory on a Windows machine.
>  on
> this, we have allocated 10G of real storage (RAM) and around 35GB of
> Swap.When websphere starts, it consumes all the memory eventually and
> halts, but not panics, the system.We are running 64-Bit.  I'm a z/VM
> novice so i don't know much to do..
>
> Here is some information from our WAS Admin:
> "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0
> installed.  There are two nodes running 14 application servers each. there
> are currently 32 applications installed but not currently running.  No
> security has been enabled for WebSphere at this time."
>
>
> At this point i see two problems:
>
> 1) Why is OOM Kill not functioning properly
> 2) Why is websphere performance so awful?
>
> and have two questions
>
> 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on
> z/VM?  So far we've been using dated case studies and redbooks that seem to
> be filled with inaccuracies or outdated information.
> 2) Is there any way to force a coredump via the cp, like you can with the
> magic sysrq?
>
> All systems are running the same release and patch level:
>
> [root] bwzld001:~# lsb_release -a
> LSB Version:
>
> core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x
> Distributor ID:SUSE LINUX
> Description:SUSE Linux Enterprise Server 11 (s390x)
> Release:11
> Codename:n/a
>
>
> Here is a partial top shortly before system death:
>
> top - 08:13:14 up 2 days, 16:08,  2 users,  load average: 51.47, 22.20,
> 10.25
> Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
> Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si,
> 1.2%st
> Mem:  10268344k total, 10220568k used,47776k free,  548k buffers
> Swap: 35764956k total, 35764956k used,0k free,56340k cached
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
>
> 26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28
> java
> 29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13
> java
> 24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14
> java
> 24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52
> java
> 26606 wasadmin  20   0 1438m 272m 6212 S   12  2.7  16:02.77
> java
> 27600 wasadmin  20   0 1553m 258m 2920 S   12  2.6

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Mrohs, Ray
Set swappiness to 0. Can you just start 1 node as a test?

Ray

> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On
> Behalf Of Daniel Tate
> Sent: Monday, July 26, 2010 2:28 PM
> To: LINUX-390@VM.MARIST.EDU
> Subject: Re: OOM Condition on SLES11 running WAS - Tuning problems?
>
> Yeah, i saw that.. problem is these same apps run on 16GB of mem on a
> windows box..
>
> We have 28 JVMs and sizes are set to 50/256.
>
> On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes <
> marcy.d.cor...@wellsfargo.com> wrote:
>
> > First of all, you've run out of memory on that server
> (Swap: 35764956k
> > total, 35764956k used,)
> > It ate all of the 10G and all of the 35G of swap.
> > How many JVM's are running and what are their min/max heap sizes?
> >
> >
> >
> > Marcy
> >
> > “This message may contain confidential and/or privileged
> information. If
> > you are not the addressee or authorized to receive this for
> the addressee,
> > you must not use, copy, disclose, or take any action based
> on this message
> > or any information herein. If you have received this
> message in error,
> > please advise the sender immediately by reply e-mail and delete this
> > message. Thank you for your cooperation."
> >
> >
> > -Original Message-
> > From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On
> Behalf Of
> > Daniel Tate
> > Sent: Monday, July 26, 2010 8:24 AM
> > To: LINUX-390@vm.marist.edu
> > Subject: [LINUX-390] OOM Condition on SLES11 running WAS -
> Tuning problems?
> >
> > We're running websphere on a z9 under z/VM 4 systems are
> live out of 8.
> > it
> > is running apps that consume around 16GB of memory on a
> Windows machine.
> >  on
> > this, we have allocated 10G of real storage (RAM) and around 35GB of
> > Swap.When websphere starts, it consumes all the memory
> eventually and
> > halts, but not panics, the system.We are running
> 64-Bit.  I'm a z/VM
> > novice so i don't know much to do..
> >
> > Here is some information from our WAS Admin:
> > "We are running WebSphere 6.1.0.25 with FP
> EJB3.0,Webservices and Web 2.0
> > installed.  There are two nodes running 14 application
> servers each. there
> > are currently 32 applications installed but not currently
> running.  No
> > security has been enabled for WebSphere at this time."
> >
> >
> > At this point i see two problems:
> >
> > 1) Why is OOM Kill not functioning properly
> > 2) Why is websphere performance so awful?
> >
> > and have two questions
> >
> > 1) Does anyone have any PRACTICAL experience/tips to
> optimize SLES11 on
> > z/VM?  So far we've been using dated case studies and
> redbooks that seem to
> > be filled with inaccuracies or outdated information.
> > 2) Is there any way to force a coredump via the cp, like
> you can with the
> > magic sysrq?
> >
> > All systems are running the same release and patch level:
> >
> > [root] bwzld001:~# lsb_release -a
> > LSB Version:
> >
> >
> core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x
:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-> 
s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:g
raphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-> 
s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:
graphics-4.0-s390x
> > Distributor ID:SUSE LINUX
> > Description:SUSE Linux Enterprise Server 11 (s390x)
> > Release:11
> > Codename:n/a
> >
> >
> > Here is a partial top shortly before system death:
> >
> > top - 08:13:14 up 2 days, 16:08,  2 users,  load average:
> 51.47, 22.20,
> > 10.25
> > Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,
> 0.3%hi,  0.3%si,
> > 1.2%st
> > Mem:  10268344k total, 10220568k used,47776k free,
> 548k buffers
> > Swap: 35764956k total, 35764956k used,0k free,
> 56340k cached
> >
> >  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> > COMMAND
> >
> > 26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28
> > java
> > 29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13
> > java
> > 24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14
> > java
> > 24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52
> > java
> > 26606 wasadmin  20   0 1438m 272m 6212 S   12

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Daniel Tate
Yeah, i saw that.. problem is these same apps run on 16GB of mem on a
windows box..

We have 28 JVMs and sizes are set to 50/256.

On Mon, Jul 26, 2010 at 11:07 AM, Marcy Cortes <
marcy.d.cor...@wellsfargo.com> wrote:

> First of all, you've run out of memory on that server (Swap: 35764956k
> total, 35764956k used,)
> It ate all of the 10G and all of the 35G of swap.
> How many JVM's are running and what are their min/max heap sizes?
>
>
>
> Marcy
>
> “This message may contain confidential and/or privileged information. If
> you are not the addressee or authorized to receive this for the addressee,
> you must not use, copy, disclose, or take any action based on this message
> or any information herein. If you have received this message in error,
> please advise the sender immediately by reply e-mail and delete this
> message. Thank you for your cooperation."
>
>
> -Original Message-
> From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of
> Daniel Tate
> Sent: Monday, July 26, 2010 8:24 AM
> To: LINUX-390@vm.marist.edu
> Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?
>
> We're running websphere on a z9 under z/VM 4 systems are live out of 8.
> it
> is running apps that consume around 16GB of memory on a Windows machine.
>  on
> this, we have allocated 10G of real storage (RAM) and around 35GB of
> Swap.When websphere starts, it consumes all the memory eventually and
> halts, but not panics, the system.We are running 64-Bit.  I'm a z/VM
> novice so i don't know much to do..
>
> Here is some information from our WAS Admin:
> "We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0
> installed.  There are two nodes running 14 application servers each. there
> are currently 32 applications installed but not currently running.  No
> security has been enabled for WebSphere at this time."
>
>
> At this point i see two problems:
>
> 1) Why is OOM Kill not functioning properly
> 2) Why is websphere performance so awful?
>
> and have two questions
>
> 1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on
> z/VM?  So far we've been using dated case studies and redbooks that seem to
> be filled with inaccuracies or outdated information.
> 2) Is there any way to force a coredump via the cp, like you can with the
> magic sysrq?
>
> All systems are running the same release and patch level:
>
> [root] bwzld001:~# lsb_release -a
> LSB Version:
>
> core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x
> Distributor ID:SUSE LINUX
> Description:SUSE Linux Enterprise Server 11 (s390x)
> Release:11
> Codename:n/a
>
>
> Here is a partial top shortly before system death:
>
> top - 08:13:14 up 2 days, 16:08,  2 users,  load average: 51.47, 22.20,
> 10.25
> Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
> Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si,
> 1.2%st
> Mem:  10268344k total, 10220568k used,47776k free,  548k buffers
> Swap: 35764956k total, 35764956k used,0k free,56340k cached
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
>
> 26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28
> java
> 29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13
> java
> 24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14
> java
> 24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52
> java
> 26606 wasadmin  20   0 1438m 272m 6212 S   12  2.7  16:02.77
> java
> 27600 wasadmin  20   0 1553m 258m 2920 S   12  2.6  15:46.57
> java
> 24638 wasadmin  20   0 7368m 1.3g  24m S   10 13.7 206:02.05
> java
> 25609 wasadmin  20   0 1528m 219m 2540 S9  2.2  16:07.33
> java
> 30258 wasadmin  20   0 1515m 249m 2592 S7  2.5  15:49.79
> java
> 25780 wasadmin  20   0 1604m 277m 2332 S6  2.8  16:31.41
> java
> 27106 wasadmin  20   0 1458m 273m 2472 S6  2.7  15:59.13
> java
> 27336 wasadmin  20   0 1528m 238m 2540 S5  2.4  15:38.82
> java
> 29164 wasadmin  20   0 1527m 224m 2608 S5  2.2  16:02.56
> java
> 31400 wasadmin  20   0 1509m 259m 2468 S5  2.6  15:26.38
> java
> 25244 wasadmin  20   0 1509m 290m 2624 S5  2.9  16:16.07
> java
> 24769 wasadmin  20   0 1409m 259m 2308 S5  2.6  16:08.12
> java
> 28796 wasadmin  20   0 1338m 263m 3076 S4  2.6  15:47.

Re: OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Marcy Cortes
First of all, you've run out of memory on that server (Swap: 35764956k total, 
35764956k used,) 
It ate all of the 10G and all of the 35G of swap.
How many JVM's are running and what are their min/max heap sizes? 



Marcy 

“This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Daniel 
Tate
Sent: Monday, July 26, 2010 8:24 AM
To: LINUX-390@vm.marist.edu
Subject: [LINUX-390] OOM Condition on SLES11 running WAS - Tuning problems?

We're running websphere on a z9 under z/VM 4 systems are live out of 8.   it
is running apps that consume around 16GB of memory on a Windows machine.  on
this, we have allocated 10G of real storage (RAM) and around 35GB of
Swap.When websphere starts, it consumes all the memory eventually and
halts, but not panics, the system.We are running 64-Bit.  I'm a z/VM
novice so i don't know much to do..

Here is some information from our WAS Admin:
"We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0
installed.  There are two nodes running 14 application servers each. there
are currently 32 applications installed but not currently running.  No
security has been enabled for WebSphere at this time."


At this point i see two problems:

1) Why is OOM Kill not functioning properly
2) Why is websphere performance so awful?

and have two questions

1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on
z/VM?  So far we've been using dated case studies and redbooks that seem to
be filled with inaccuracies or outdated information.
2) Is there any way to force a coredump via the cp, like you can with the
magic sysrq?

All systems are running the same release and patch level:

[root] bwzld001:~# lsb_release -a
LSB Version:
core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x
Distributor ID:SUSE LINUX
Description:SUSE Linux Enterprise Server 11 (s390x)
Release:11
Codename:n/a


Here is a partial top shortly before system death:

top - 08:13:14 up 2 days, 16:08,  2 users,  load average: 51.47, 22.20,
10.25
Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si,
1.2%st
Mem:  10268344k total, 10220568k used,47776k free,  548k buffers
Swap: 35764956k total, 35764956k used,0k free,56340k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND

26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28
java
29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13
java
24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14
java
24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52
java
26606 wasadmin  20   0 1438m 272m 6212 S   12  2.7  16:02.77
java
27600 wasadmin  20   0 1553m 258m 2920 S   12  2.6  15:46.57
java
24638 wasadmin  20   0 7368m 1.3g  24m S   10 13.7 206:02.05
java
25609 wasadmin  20   0 1528m 219m 2540 S9  2.2  16:07.33
java
30258 wasadmin  20   0 1515m 249m 2592 S7  2.5  15:49.79
java
25780 wasadmin  20   0 1604m 277m 2332 S6  2.8  16:31.41
java
27106 wasadmin  20   0 1458m 273m 2472 S6  2.7  15:59.13
java
27336 wasadmin  20   0 1528m 238m 2540 S5  2.4  15:38.82
java
29164 wasadmin  20   0 1527m 224m 2608 S5  2.2  16:02.56
java
31400 wasadmin  20   0 1509m 259m 2468 S5  2.6  15:26.38
java
25244 wasadmin  20   0 1509m 290m 2624 S5  2.9  16:16.07
java
24769 wasadmin  20   0 1409m 259m 2308 S5  2.6  16:08.12
java
28796 wasadmin  20   0 1338m 263m 3076 S4  2.6  15:47.72
java
26185 wasadmin  20   0 1493m 274m 2304 S2  2.7  16:01.97
java
25968 wasadmin  20   0 1427m 257m 2532 S1  2.6  15:51.50
java
29495 wasadmin  20   0 1466m 259m 2260 S1  2.6  15:31.82
java
25080 wasadmin  20   0 1445m 236m 2472 S0  2.4  15:53.19
java
26410 wasadmin  20   0 1475m 271m 2540 S0  2.7  15:52.48
java
31027 wasadmin  20   0 1413m 238m 2492 S0  2.4  15:29.78
java
 3695 wasadmin  20   0  9968 1352 1352 S0  0.0   0:00.13
bash
24474 wasadmin  20   0 1468m 205m 2472 S0  2.0  16:03.63
java
24920 wasadmin  20   0 1522m 263m 2616 S0  2.6  16:06.29
java
25422 wasadmin  20   0 1584m 229m 2284 S0  2.3  16:02.18
java
27892 wasadmin  20   0 1414m 263m 2648 S0  2.6  15:45.96
java
28184 wasadmin 

OOM Condition on SLES11 running WAS - Tuning problems?

2010-07-26 Thread Daniel Tate
We're running websphere on a z9 under z/VM 4 systems are live out of 8.   it
is running apps that consume around 16GB of memory on a Windows machine.  on
this, we have allocated 10G of real storage (RAM) and around 35GB of
Swap.When websphere starts, it consumes all the memory eventually and
halts, but not panics, the system.We are running 64-Bit.  I'm a z/VM
novice so i don't know much to do..

Here is some information from our WAS Admin:
"We are running WebSphere 6.1.0.25 with FP EJB3.0,Webservices and Web 2.0
installed.  There are two nodes running 14 application servers each. there
are currently 32 applications installed but not currently running.  No
security has been enabled for WebSphere at this time."


At this point i see two problems:

1) Why is OOM Kill not functioning properly
2) Why is websphere performance so awful?

and have two questions

1) Does anyone have any PRACTICAL experience/tips to optimize SLES11 on
z/VM?  So far we've been using dated case studies and redbooks that seem to
be filled with inaccuracies or outdated information.
2) Is there any way to force a coredump via the cp, like you can with the
magic sysrq?

All systems are running the same release and patch level:

[root] bwzld001:~# lsb_release -a
LSB Version:
core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-s390x:core-3.2-s390x:core-4.0-s390x:desktop-4.0-noarch:desktop-4.0-s390:desktop-4.0-s390x:graphics-2.0-noarch:graphics-2.0-s390:graphics-2.0-s390x:graphics-3.2-noarch:graphics-3.2-s390:graphics-3.2-s390x:graphics-4.0-noarch:graphics-4.0-s390:graphics-4.0-s390x
Distributor ID:SUSE LINUX
Description:SUSE Linux Enterprise Server 11 (s390x)
Release:11
Codename:n/a


Here is a partial top shortly before system death:

top - 08:13:14 up 2 days, 16:08,  2 users,  load average: 51.47, 22.20,
10.25
Tasks: 129 total,   4 running, 125 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.7%us, 81.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.3%hi,  0.3%si,
1.2%st
Mem:  10268344k total, 10220568k used,47776k free,  548k buffers
Swap: 35764956k total, 35764956k used,0k free,56340k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND

26850 wasadmin  20   0 1506m 253m 2860 S   18  2.5  16:06.28
java
29870 wasadmin  20   0 1497m 279m 2560 S   15  2.8  15:41.13
java
24607 wasadmin  20   0 1502m 223m 2760 S   13  2.2  16:15.14
java
24641 wasadmin  20   0 7229m 1.3g 3172 S   13 13.1 196:35.52
java
26606 wasadmin  20   0 1438m 272m 6212 S   12  2.7  16:02.77
java
27600 wasadmin  20   0 1553m 258m 2920 S   12  2.6  15:46.57
java
24638 wasadmin  20   0 7368m 1.3g  24m S   10 13.7 206:02.05
java
25609 wasadmin  20   0 1528m 219m 2540 S9  2.2  16:07.33
java
30258 wasadmin  20   0 1515m 249m 2592 S7  2.5  15:49.79
java
25780 wasadmin  20   0 1604m 277m 2332 S6  2.8  16:31.41
java
27106 wasadmin  20   0 1458m 273m 2472 S6  2.7  15:59.13
java
27336 wasadmin  20   0 1528m 238m 2540 S5  2.4  15:38.82
java
29164 wasadmin  20   0 1527m 224m 2608 S5  2.2  16:02.56
java
31400 wasadmin  20   0 1509m 259m 2468 S5  2.6  15:26.38
java
25244 wasadmin  20   0 1509m 290m 2624 S5  2.9  16:16.07
java
24769 wasadmin  20   0 1409m 259m 2308 S5  2.6  16:08.12
java
28796 wasadmin  20   0 1338m 263m 3076 S4  2.6  15:47.72
java
26185 wasadmin  20   0 1493m 274m 2304 S2  2.7  16:01.97
java
25968 wasadmin  20   0 1427m 257m 2532 S1  2.6  15:51.50
java
29495 wasadmin  20   0 1466m 259m 2260 S1  2.6  15:31.82
java
25080 wasadmin  20   0 1445m 236m 2472 S0  2.4  15:53.19
java
26410 wasadmin  20   0 1475m 271m 2540 S0  2.7  15:52.48
java
31027 wasadmin  20   0 1413m 238m 2492 S0  2.4  15:29.78
java
 3695 wasadmin  20   0  9968 1352 1352 S0  0.0   0:00.13
bash
24474 wasadmin  20   0 1468m 205m 2472 S0  2.0  16:03.63
java
24920 wasadmin  20   0 1522m 263m 2616 S0  2.6  16:06.29
java
25422 wasadmin  20   0 1584m 229m 2284 S0  2.3  16:02.18
java
27892 wasadmin  20   0 1414m 263m 2648 S0  2.6  15:45.96
java
28184 wasadmin  20   0 1523m 241m 2320 S0  2.4  15:42.21
java
28486 wasadmin  20   0 1450m 231m 2288 S0  2.3  15:46.53
java
30625 wasadmin  20   0 1477m 251m 3024 S0  2.5  15:44.80 java

-


Here are a few screen grabs from the 3720 Console session:

Unless you get a _continuous_flood_ of these messages it means
everything is working fine. Allocations from irqs cannot be
perfectly reliable and the kernel is designed to handle that.
java: page allocation failure. order:0, mode:0x20, alloc_flags:0x7,
pflags:0x400
040
CPU: 1 Not tainted 2.6.27.45-0.1-default #1
Process java (pid: 28831, task: 0001ab64c638, ksp: 000215bbb5e0)
 00027fbcf7b0 0002 
   00027fbcf850 00027fbcf7c8 00027fbcf7c8 003b6696
   014a4e88 0007 00634e00 
   000d  00027fbcf818 0