Re: linux performance behind load balancer

2007-09-16 Thread Rob van der Heij
On 9/14/07, Mark Post [EMAIL PROTECTED] wrote:

 It was Rob working with me on the Linux/390 wiki system that led him to the 
 discovery that the IBM JDK was issuing 10ms sleeps.  It wasn't just in the 
 newer versions of the JDK, it was in the 1.4.2 ones as well.  So, upgrading 
 to a newer version of WAS and it's associated Java, shouldn't be any worse in 
 that regard.

He's back now... ;-)  There are issues in some later JDK levels (not
yet 1.4.2) and Velocity Software have a bypass for it - IBM is working
on getting the bypass supported and eventually fix the problem.
And there are problems in applications. In the Open Source Java
application Mark and I looked at, I could identify the cause and
repair it. With later versions of WAS (after 4.5) there are similar
problems. And with closed source like WAS I can only identify it and
yell at developers who may listen but do not understand.

Because each Java application seems to ship with its own copy of the
JVM, it's not always easy to understand whether its the JVM or the
application. But it may not be relevant either where the problem is,
unless you're in a position to fix it...

Oh, and part of IBM's DB2 also has such problems, even though it's not in Java.

IMHO the cause is a skills problem with the developers. Poorly written
code they can get away with on discrete servers, that is slightly
noticed on simple virtualization like VMware, but impacts scalability
when you get as advanced as we can do with z/VM.

This is not easy stuff. Although Barton says people don't want to hear
difficult explanations, I may need to sit down and write about it.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Rob van der Heij
On 9/14/07, Alan Altmark [EMAIL PROTECTED] wrote:

 But the importance of that depends on what you want to know, doesn't it?
 If you're interested in which Linux process is hogging the guest, the
 absolute number is irrelevant.

But if you're comparing usage before and after some configuration
change, it does become important. Simply the fact that you generate
load in other virtual machines that were idle before, Linux will think
the real business process takes more CPU resources per transaction
than before. And Linux tools will tell you so, even though it is not
true. That's what may make you bark up the wrong tree.

But you *are* very right that performance monitor should be part of
your Proof of Concept. We don't see a PoC fail these days because
software does not work or cannot be found. We see it fail because
people suffer from poor performance and have nobody to turn to for
help in that new environment. So they get folks with Linux skills on
discrete servers and they do all the wrong things because tuning with
Linux on z/VM is rarely intuitive. Or folks using the wrong tools to
measure and draw the wrong conclusions about the capacity of their
installation and their TCO if they grow it.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Evans, Kevin R
Rob,

As we are just switching to Omegamon and almost up to implementation of
our first user to come into a new zLinux front end, can you give ant
further details on your comment below?

Thanks

Kevin

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Rob van der Heij
Sent: Thursday, September 13, 2007 4:07 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: linux performance behind load balancer

On 9/13/07, Alan Altmark [EMAIL PROTECTED] wrote:

 Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's
more
 than one decent performance monitor Out There, so shop and compare.

But since that will present incorrect CPU breakdown per Linux process,
it may lead to wrong conclusions. ESALPS will correct the CPU usage
for virtualization effects.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Alan Altmark
On Friday, 09/14/2007 at 02:19 EDT, Rob van der Heij
[EMAIL PROTECTED] wrote:
 But you *are* very right that performance monitor should be part of
 your Proof of Concept. We don't see a PoC fail these days because
 software does not work or cannot be found. We see it fail because
 people suffer from poor performance and have nobody to turn to for
 help in that new environment. So they get folks with Linux skills on
 discrete servers and they do all the wrong things because tuning with
 Linux on z/VM is rarely intuitive. Or folks using the wrong tools to
 measure and draw the wrong conclusions about the capacity of their
 installation and their TCO if they grow it.

Amen.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Mark Post
 On Fri, Sep 14, 2007 at  6:48 AM, in message
[EMAIL PROTECTED], Evans, Kevin
R [EMAIL PROTECTED] wrote: 
 Rob,
 
 As we are just switching to Omegamon and almost up to implementation of
 our first user to come into a new zLinux front end, can you give ant
 further details on your comment below?

Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption was 
tracked by the Linux kernel didn't take into account that the system may be 
running in a shared/virtualized environment.  The (valid until LPARs, z/VM, 
VMware, and Xen) assumption in place was that the kernel was in complete 
control of the hardware, so any passage of time between the last clock value 
taken, and the current one, was assigned to whatever process was dispatched in 
the interval.  The problem being, of course, that the virtual machine/LPAR 
might not have been running at all during that time.  So, Linux could report 
that the CPU was 100% busy, when in fact it was only being dispatched, for 
example, 3% of the time.

Of the various performance monitors that were being marketed for mainframe 
Linux, only Velocity Software's product combined the Linux data with the z/VM 
monitor data, and normalized the Linux values to be correct.  (Obviously this 
only worked in an environment where z/VM was being used as the hypervisor.)  
This was a big factor in many cases of which monitor to choose.  Since the 
release of the cpu accounting patches, and incorporation into SLES and RHEL, 
that's no longer the case, unless you're still running SLES8 (Hi, Marcy!) and 
SLES9 (Hi, almost everyone else!), or RHEL3 or 4.  Now the decision is based on 
more traditional criteria, as opposed to being right or very wrong.

If you have a userid and password to access the SHARE proceedings, you can see 
Martin Schwidefsky's presentation on this at
http://www.share.org/member_center/open_document.cfm?document=proceedings/SHARE_in_Seattle/S9266XX172938.pdf

(I have no idea why I didn't ask Martin for a copy of that for the linuxvm.org 
web site.  Rats.)


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Marcy Cortes
(Hi Mark!)

That's the disadvantage of starting before everyone else and having too
many servers :)
At least I've killed the sles7's!

The problem with sles8 to sles9x is it's a new server.  That requires
the cooperation of the users.  They don't like to do that if everything
is all hunky dory.  They have other things to do (so they tell me).

I'm hoping sles9x to sles10x is a true upgrade and we can do it without
bothering the applications folks.  That's a project to figure out over
the holiday freeze, though.

I'm pretty sure all of production will be sles9x within the next 2
months - woo hoo!  The promise of better performance from WAS6.1 and
sles9x saving them a few IFLs is finally getting their attention.

(see you next week).

Marcy Cortes
 
This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation.


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Mark Post
Sent: Friday, September 14, 2007 9:20 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer

 On Fri, Sep 14, 2007 at  6:48 AM, in message
[EMAIL PROTECTED],
Evans, Kevin R [EMAIL PROTECTED] wrote: 
 Rob,
 
 As we are just switching to Omegamon and almost up to implementation 
 of our first user to come into a new zLinux front end, can you give 
 ant further details on your comment below?

Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption
was tracked by the Linux kernel didn't take into account that the system
may be running in a shared/virtualized environment.  The (valid until
LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel
was in complete control of the hardware, so any passage of time between
the last clock value taken, and the current one, was assigned to
whatever process was dispatched in the interval.  The problem being, of
course, that the virtual machine/LPAR might not have been running at all
during that time.  So, Linux could report that the CPU was 100% busy,
when in fact it was only being dispatched, for example, 3% of the time.

Of the various performance monitors that were being marketed for
mainframe Linux, only Velocity Software's product combined the Linux
data with the z/VM monitor data, and normalized the Linux values to be
correct.  (Obviously this only worked in an environment where z/VM was
being used as the hypervisor.)  This was a big factor in many cases of
which monitor to choose.  Since the release of the cpu accounting
patches, and incorporation into SLES and RHEL, that's no longer the
case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi,
almost everyone else!), or RHEL3 or 4.  Now the decision is based on
more traditional criteria, as opposed to being right or very wrong.

If you have a userid and password to access the SHARE proceedings, you
can see Martin Schwidefsky's presentation on this at
http://www.share.org/member_center/open_document.cfm?document=proceeding
s/SHARE_in_Seattle/S9266XX172938.pdf

(I have no idea why I didn't ask Martin for a copy of that for the
linuxvm.org web site.  Rats.)


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions, send
email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread barton

Forgive the soap box. This is old news.  Linux process data in any virtual 
environment is
wrong.  This was measured and presented in a production environment as off by 
order of
magnitude. This is true for all releases and distributions of linux.  ibm 
claims there is
a fix in sles10, this has never been validated in any presentation. i don't 
like claims
that sound suspiciously like vaporware.

Why is this data bad? it's useless.
1) imagine someone doing application tuning and using this data and thinking 
they've
improved their app performance - and their data is wrong, leads to wrong 
conclusion
2) system utilization high, logon to any linux using linux tools, you might 
think top is
the hog, but if the system utilization is high, you will make very poor choices 
of what
processes or linux server to kill
3) if you are making poc decisions based on this data, you will think the 
mainframe is dog
slow. this is bad for all of us and leads to poor financial decisions.

This gives a very good platform a bad image.

6 years ago when velocity software analyzed this, we found a way to correct and 
record the
process data. (for all linux releases and distributions) thus this data is 
useful for all
of the above. no other vendor (other than velocity software) has presented a 
solution for
this problem - and validated in any public forum.

So when i hear about installations planning on dependancies on garbage data, i 
think about
 how many people would drive cars without gas gauges or speedometers? was the 
used car
salesperson ethical in taking advantage of the naive buyer?






Evans, Kevin R wrote:

Rob,

As we are just switching to Omegamon and almost up to implementation of
our first user to come into a new zLinux front end, can you give ant
further details on your comment below?

Thanks

Kevin

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Rob van der Heij
Sent: Thursday, September 13, 2007 4:07 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: linux performance behind load balancer

On 9/13/07, Alan Altmark [EMAIL PROTECTED] wrote:



Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's


more


than one decent performance monitor Out There, so shop and compare.



But since that will present incorrect CPU breakdown per Linux process,
it may lead to wrong conclusions. ESALPS will correct the CPU usage
for virtualization effects.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
begin:vcard
fn:Barton Robinson
n:Robinson;Barton
adr;dom:;;PO 390640;Mountain View;CA;94039-0640
email;internet:[EMAIL PROTECTED]
title:Sr. Architect
tel;work:650-964-8867
note:If you can't measure it, I'm just not interested
x-mozilla-html:FALSE
url:http://velocitysoftware.com
version:2.1
end:vcard



Re: linux performance behind load balancer

2007-09-14 Thread barton

There are some issues with WAS right now that seriously impact Linux under 
z/VM.  Rob's
out of town, he can explain  better.  The problem is that the current JDK polls 
every
10ms.  this means the WAS servers stay in queue. We have been seeing the total 
to virtual
storage over allocation ratios that sites can attain have been dropping, traced 
it down to
servers not dropping from queue. Rob tracked it down to the WAS polling. We're 
hoping for
relief next year. So be careful about the performance feecher of 6.1.



Marcy Cortes wrote:


(Hi Mark!)

I'm pretty sure all of production will be sles9x within the next 2
months - woo hoo!  The promise of better performance from WAS6.1 and
sles9x saving them a few IFLs is finally getting their attention.

(see you next week).

Marcy Cortes


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Mark Post
Sent: Friday, September 14, 2007 9:20 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer



On Fri, Sep 14, 2007 at  6:48 AM, in message


[EMAIL PROTECTED],
Evans, Kevin R [EMAIL PROTECTED] wrote:


Rob,

As we are just switching to Omegamon and almost up to implementation
of our first user to come into a new zLinux front end, can you give
ant further details on your comment below?



Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption
was tracked by the Linux kernel didn't take into account that the system
may be running in a shared/virtualized environment.  The (valid until
LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel
was in complete control of the hardware, so any passage of time between
the last clock value taken, and the current one, was assigned to
whatever process was dispatched in the interval.  The problem being, of
course, that the virtual machine/LPAR might not have been running at all
during that time.  So, Linux could report that the CPU was 100% busy,
when in fact it was only being dispatched, for example, 3% of the time.

Of the various performance monitors that were being marketed for
mainframe Linux, only Velocity Software's product combined the Linux
data with the z/VM monitor data, and normalized the Linux values to be
correct.  (Obviously this only worked in an environment where z/VM was
being used as the hypervisor.)  This was a big factor in many cases of
which monitor to choose.  Since the release of the cpu accounting
patches, and incorporation into SLES and RHEL, that's no longer the
case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi,
almost everyone else!), or RHEL3 or 4.  Now the decision is based on
more traditional criteria, as opposed to being right or very wrong.

If you have a userid and password to access the SHARE proceedings, you
can see Martin Schwidefsky's presentation on this at
http://www.share.org/member_center/open_document.cfm?document=proceeding
s/SHARE_in_Seattle/S9266XX172938.pdf

(I have no idea why I didn't ask Martin for a copy of that for the
linuxvm.org web site.  Rats.)


Mark Post



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
begin:vcard
fn:Barton Robinson
n:Robinson;Barton
adr;dom:;;PO 390640;Mountain View;CA;94039-0640
email;internet:[EMAIL PROTECTED]
title:Sr. Architect
tel;work:650-964-8867
note:If you can't measure it, I'm just not interested
x-mozilla-html:FALSE
url:http://velocitysoftware.com
version:2.1
end:vcard



Re: linux performance behind load balancer

2007-09-14 Thread Mark Post
 On Fri, Sep 14, 2007 at  4:11 PM, in message
[EMAIL PROTECTED], barton
[EMAIL PROTECTED] wrote: 
 There are some issues with WAS right now that seriously impact Linux under 
 z/VM.  Rob's
 out of town, he can explain  better.  The problem is that the current JDK 
 polls every
 10ms.  this means the WAS servers stay in queue. We have been seeing the 
 total to virtual
 storage over allocation ratios that sites can attain have been dropping, 
 traced it down to
 servers not dropping from queue. Rob tracked it down to the WAS polling. 
 We're hoping for
 relief next year. So be careful about the performance feecher of 6.1.

It was Rob working with me on the Linux/390 wiki system that led him to the 
discovery that the IBM JDK was issuing 10ms sleeps.  It wasn't just in the 
newer versions of the JDK, it was in the 1.4.2 ones as well.  So, upgrading to 
a newer version of WAS and it's associated Java, shouldn't be any worse in that 
regard.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Marcy Cortes
Production I'm not so worried about because it has adequate capacity (no
paging) and the servers run all the time anyway so don't drop from queue
because of real work.  They've benchmarked and measured (with Velocity
tools :) the differences betweens sles9x/was6 and sles8/was5 and see
significant differences.  We'll let you know for sure next week with the
real workload going through.

But this might explain the increased paging load on our test system,
which already was bursting at the seams.

Do you know what level of the JDK that started in? 


Marcy Cortes 
 
This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation.


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
barton
Sent: Friday, September 14, 2007 1:12 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer

There are some issues with WAS right now that seriously impact Linux
under z/VM.  Rob's out of town, he can explain  better.  The problem is
that the current JDK polls every 10ms.  this means the WAS servers stay
in queue. We have been seeing the total to virtual storage over
allocation ratios that sites can attain have been dropping, traced it
down to servers not dropping from queue. Rob tracked it down to the WAS
polling. We're hoping for relief next year. So be careful about the
performance feecher of 6.1.



Marcy Cortes wrote:

 (Hi Mark!)

 I'm pretty sure all of production will be sles9x within the next 2 
 months - woo hoo!  The promise of better performance from WAS6.1 and 
 sles9x saving them a few IFLs is finally getting their attention.

 (see you next week).

 Marcy Cortes


 -Original Message-
 From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of 
 Mark Post
 Sent: Friday, September 14, 2007 9:20 AM
 To: LINUX-390@VM.MARIST.EDU
 Subject: Re: [LINUX-390] linux performance behind load balancer


On Fri, Sep 14, 2007 at  6:48 AM, in message

 [EMAIL PROTECTED],
 Evans, Kevin R [EMAIL PROTECTED] wrote:

Rob,

As we are just switching to Omegamon and almost up to implementation 
of our first user to come into a new zLinux front end, can you give 
ant further details on your comment below?


 Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption

 was tracked by the Linux kernel didn't take into account that the 
 system may be running in a shared/virtualized environment.  The (valid

 until LPARs, z/VM, VMware, and Xen) assumption in place was that the 
 kernel was in complete control of the hardware, so any passage of time

 between the last clock value taken, and the current one, was assigned 
 to whatever process was dispatched in the interval.  The problem 
 being, of course, that the virtual machine/LPAR might not have been 
 running at all during that time.  So, Linux could report that the CPU 
 was 100% busy, when in fact it was only being dispatched, for example,
3% of the time.

 Of the various performance monitors that were being marketed for 
 mainframe Linux, only Velocity Software's product combined the Linux 
 data with the z/VM monitor data, and normalized the Linux values to be

 correct.  (Obviously this only worked in an environment where z/VM was

 being used as the hypervisor.)  This was a big factor in many cases of

 which monitor to choose.  Since the release of the cpu accounting
 patches, and incorporation into SLES and RHEL, that's no longer the 
 case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi, 
 almost everyone else!), or RHEL3 or 4.  Now the decision is based on 
 more traditional criteria, as opposed to being right or very wrong.

 If you have a userid and password to access the SHARE proceedings, you

 can see Martin Schwidefsky's presentation on this at 
 http://www.share.org/member_center/open_document.cfm?document=proceedi
 ng
 s/SHARE_in_Seattle/S9266XX172938.pdf

 (I have no idea why I didn't ask Martin for a copy of that for the 
 linuxvm.org web site.  Rats.)


 Mark Post


--
For LINUX-390 subscribe / signoff / archive access instructions, send
email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread Mark Post
 On Fri, Sep 14, 2007 at  4:57 PM, in message
[EMAIL PROTECTED], Marcy
Cortes [EMAIL PROTECTED] wrote: 
-snip-
 Do you know what level of the JDK that started in? 

It's been around for a while.  The version I was running at the time Rob first 
investigated was an older 1.4.2 release.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-14 Thread John Summerfield

barton wrote:

Forgive the soap box. This is old news.  Linux process data in any
virtual environment is
wrong.  This was measured and presented in a production environment as
off by order of
magnitude. This is true for all releases and distributions of linux.
ibm claims there is
a fix in sles10, this has never been validated in any presentation. i
don't like claims
that sound suspiciously like vaporware.


Your own comments don't sound any better, to me. Are you claiming that
the accounting patches mentioned elsewhere don't work as claimed? Can
you support that claim?




--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]

Please do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-13 Thread Kate Riggsby
Thank you all for sharing experiences and for advice. It gives me
hope there may be a way around my brick wall!

Rob, page-in is what this problem feels like. But
the lpar has 6G/2G of main/xstor and the total virtual storage
of all the guests together is 3264M, of which the problem linux guest
has 2G.  VM reports 0 paging.

Just to clarify, there is one ifl on the z800. The lpar is running the
regular VM service machines, two small non-load-balanced linuxes
and then this problem instance. The linux I'm talking about
is load-balanced with five standalone (not on VM)
boxes. Our vm linux instance shouldn't be running a firewall
but it's another thing I should verify.

Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are
running SLES9 SP3 (64bit). The system is using Performance Toolkit.
cat /proc/meminfo shows:
   MemTotal:  2050128 kB   LowFree:343412 kB
   MemFree:343412 kB   SwapTotal:  475852 kB
   Buffers:143588 kB   SwapFree:   475852 kB
   Cached:1128868 kB   Dirty: 444 kB
   SwapCached:  0 kB   Writeback:   0 kB
   Active:1035208 kB   Mapped: 279544 kB
   Inactive:   486132 kB   Slab:   163204 kB
   HighTotal:   0 kB   Committed_AS:  2997492 kB
   HighFree:0 kB   PageTables:   2496 kB
   LowTotal:  2050128 kB   VmallocTotal: 4292861952 kB
   VmallocUsed:  2532 kB
   VmallocChunk: 4292859180 kB
thanks,
kate

On 9/11/07, Rob van der Heij wrote:

Although you say there's enough real memory, it may be the system is
not configured correctly and still pages the Linux guests. Your
performance monitor should be able to provide more data than what you
mention in your post. You'd need to see whether it's indeed these two
Linux servers that consume the extra cycles, and if so, see which
processes are doing that in Linux. I would not expect that opening a
connecting on port 443 and ending it would cause a lot of CPU activity
(unless it triggers firewalls in Linux).

I think I read from your post that there's one IFL on the z800. That
means that you probably don't make things go faster by spreading the
load over multiple virtual machines (actually, you will make it
slower). The folks who came up with the model of probing port 443 may
have had a different failure model than what's applicable to running
two Linux virtual machines on the same z/VM (but I also know that such
sometimes is a nasty fight).

Rob
--
Rob van der Heij

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-13 Thread barton

A decent performance monitor (ESALPS comes to mind) will tell you exactly what 
processes
are using the cpu and exactly how much. Have you considered running a decent 
performance
monitor?



Kate Riggsby wrote:


Thank you all for sharing experiences and for advice. It gives me
hope there may be a way around my brick wall!

Rob, page-in is what this problem feels like. But
the lpar has 6G/2G of main/xstor and the total virtual storage
of all the guests together is 3264M, of which the problem linux guest
has 2G.  VM reports 0 paging.

Just to clarify, there is one ifl on the z800. The lpar is running the
regular VM service machines, two small non-load-balanced linuxes
and then this problem instance. The linux I'm talking about
is load-balanced with five standalone (not on VM)
boxes. Our vm linux instance shouldn't be running a firewall
but it's another thing I should verify.

Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are
running SLES9 SP3 (64bit). The system is using Performance Toolkit.
cat /proc/meminfo shows:
   MemTotal:  2050128 kB   LowFree:343412 kB
   MemFree:343412 kB   SwapTotal:  475852 kB
   Buffers:143588 kB   SwapFree:   475852 kB
   Cached:1128868 kB   Dirty: 444 kB
   SwapCached:  0 kB   Writeback:   0 kB
   Active:1035208 kB   Mapped: 279544 kB
   Inactive:   486132 kB   Slab:   163204 kB
   HighTotal:   0 kB   Committed_AS:  2997492 kB
   HighFree:0 kB   PageTables:   2496 kB
   LowTotal:  2050128 kB   VmallocTotal: 4292861952 kB
   VmallocUsed:  2532 kB
   VmallocChunk: 4292859180 kB
thanks,
kate

On 9/11/07, Rob van der Heij wrote:



Although you say there's enough real memory, it may be the system is
not configured correctly and still pages the Linux guests. Your
performance monitor should be able to provide more data than what you
mention in your post. You'd need to see whether it's indeed these two
Linux servers that consume the extra cycles, and if so, see which
processes are doing that in Linux. I would not expect that opening a
connecting on port 443 and ending it would cause a lot of CPU activity
(unless it triggers firewalls in Linux).

I think I read from your post that there's one IFL on the z800. That
means that you probably don't make things go faster by spreading the
load over multiple virtual machines (actually, you will make it
slower). The folks who came up with the model of probing port 443 may
have had a different failure model than what's applicable to running
two Linux virtual machines on the same z/VM (but I also know that such
sometimes is a nasty fight).

Rob
--
Rob van der Heij



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390






--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
begin:vcard
fn:Barton Robinson
n:Robinson;Barton
adr;dom:;;PO 390640;Mountain View;CA;94039-0640
email;internet:[EMAIL PROTECTED]
title:Sr. Architect
tel;work:650-964-8867
note:If you can't measure it, I'm just not interested
x-mozilla-html:FALSE
url:http://velocitysoftware.com
version:2.1
end:vcard



Re: linux performance behind load balancer

2007-09-13 Thread Alan Altmark
On Thursday, 09/13/2007 at 10:22 EDT, barton
[EMAIL PROTECTED] wrote:
 A decent performance monitor (ESALPS comes to mind) will tell you
exactly what
 processes
 are using the cpu and exactly how much. Have you considered running a
decent
 performance
 monitor?

Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's more
than one decent performance monitor Out There, so shop and compare.

Real point:  Successful z/VM+Linux deployments include, among other
things, tools that can monitor resource consumption of the box, your LPAR,
and your Linux guests.  But as has been noted, while that function is
necessary, it is not, by any reasonable measure, sufficient.  You must
also be able to correlate that with information on what's going on
*inside* the guest.

IMO, they should be part of POCs, too.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-13 Thread Marcy Cortes
As well as inside your App Server if you are using one of those.  Easy to
create bad java or misconfigured WAS :).  Introscope and ITCAM are 2
examples of those.


Marcy Cortes

This message may contain confidential and/or privileged information. If you
are not the addressee or authorized to receive this for the addressee, you
must not use, copy, disclose, or take any action based on this message or
any information herein. If you have received this message in error, please
advise the sender immediately by reply e-mail and delete this message. Thank
you for your cooperation.


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Alan
Altmark
Sent: Thursday, September 13, 2007 7:56 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer

On Thursday, 09/13/2007 at 10:22 EDT, barton
[EMAIL PROTECTED] wrote:
 A decent performance monitor (ESALPS comes to mind) will tell you
exactly what
 processes
 are using the cpu and exactly how much. Have you considered running a
decent
 performance
 monitor?

Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's more
than one decent performance monitor Out There, so shop and compare.

Real point:  Successful z/VM+Linux deployments include, among other things,
tools that can monitor resource consumption of the box, your LPAR, and your
Linux guests.  But as has been noted, while that function is necessary, it
is not, by any reasonable measure, sufficient.  You must also be able to
correlate that with information on what's going on
*inside* the guest.

IMO, they should be part of POCs, too.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions, send email
to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


smime.p7s
Description: S/MIME cryptographic signature


Re: linux performance behind load balancer

2007-09-13 Thread Rob van der Heij
On 9/13/07, Alan Altmark [EMAIL PROTECTED] wrote:

 Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's more
 than one decent performance monitor Out There, so shop and compare.

But since that will present incorrect CPU breakdown per Linux process,
it may lead to wrong conclusions. ESALPS will correct the CPU usage
for virtualization effects.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-13 Thread Alan Altmark
On Thursday, 09/13/2007 at 04:07 EDT, Rob van der Heij
[EMAIL PROTECTED] wrote:
 On 9/13/07, Alan Altmark [EMAIL PROTECTED] wrote:

  Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's
more
  than one decent performance monitor Out There, so shop and compare.

 But since that will present incorrect CPU breakdown per Linux process,
 it may lead to wrong conclusions. ESALPS will correct the CPU usage
 for virtualization effects.

SLES 10 and RHEL 5 correct for the virtualization effects and OMEGAMON
gives the normalized numbers.

But the importance of that depends on what you want to know, doesn't it?
If you're interested in which Linux process is hogging the guest, the
absolute number is irrelevant.

It *is* true that for capacity planning and chargeback you need a better
absolute number than what older distros provide.  As a result, we decided
to update OMEGAMON to normalize those numbers for guests that don't
generate the more accurate data.  We hope to deliver that late this year
or early next.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-13 Thread Mark Post
 On Thu, Sep 13, 2007 at  9:38 AM, in message
[EMAIL PROTECTED], Kate Riggsby [EMAIL PROTECTED]
wrote: 
-snip-
 Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are
 running SLES9 SP3 (64bit). The system is using Performance Toolkit.
 cat /proc/meminfo shows:
MemTotal:  2050128 kB   LowFree:343412 kB
MemFree:343412 kB   SwapTotal:  475852 kB
Buffers:143588 kB   SwapFree:   475852 kB
Cached:1128868 kB   Dirty: 444 kB
SwapCached:  0 kB   Writeback:   0 kB
Active:1035208 kB   Mapped: 279544 kB
Inactive:   486132 kB   Slab:   163204 kB
HighTotal:   0 kB   Committed_AS:  2997492 kB
HighFree:0 kB   PageTables:   2496 kB
LowTotal:  2050128 kB   VmallocTotal: 4292861952 kB
VmallocUsed:  2532 kB
VmallocChunk: 4292859180 kB

As lots of other people have already said (and with whom I agree), you really 
need some sort of performance monitor to figure out problems such as this.  One 
thing I will say regardless of that, is that it looks like you have almost half 
a gig of inactive pages in your guest.  That means that you can likely reduce 
your guest size by about that much (subject to subsequent measurement of the 
results), and see if your Linux paging rates go up a whole bunch.  The fact 
that your system is using about 1GB of storage for caching tells me that likely 
they won't.  (And, don't confuse page space in use with paging I/O _rates_.  
The first is OK, the second needs to be watched closely.)

Doing this is not likely to have an impact on your immediate problem, but 
keeping your guest sizes as small as possible is a good habit to get into now.

Second, since this is a brand new install, why are you using SLES9, and not 
SLES10?  ISV certifications and such is a perfectly good reason, of course, but 
if there's nothing like that standing in the way, I would recommend using 
SLES10 SP1.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-13 Thread Mark Post
 On Thu, Sep 13, 2007 at  9:45 PM, in message
[EMAIL PROTECTED], Alan
Altmark [EMAIL PROTECTED] wrote: 
-snip-
 SLES 10 and RHEL 5 correct for the virtualization effects and OMEGAMON
 gives the normalized numbers.

Unfortunately in this case, SLES9 is being used, so the incorrect performance 
data is a problem.  (One of the fairly numerous reasons I would recommend using 
SLES10 SP1.)

 But the importance of that depends on what you want to know, doesn't it?
 If you're interested in which Linux process is hogging the guest, the
 absolute number is irrelevant.

With the way things were designed in the 2.4 and earlier 2.6 kernels, that 
might be somewhat suspect as well.  I've seen cases where the bad guy process 
wasn't easy to find.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-12 Thread Mark Post
 On Mon, Sep 10, 2007 at  1:20 PM, in message
[EMAIL PROTECTED], Kate Riggsby [EMAIL PROTECTED]
wrote: 
-snip-
The linux userid running the application was using about 3-4% of the
 cpu. The day we added our instance to the (external) load balancer its
 base cpu consumption went to 18% of the ifl, even during application 
 downtime.

Kate,

What distribution (and version) are you running?  What performance monitor are 
you using?

As Rob said, much more information is needed to even start to figure this out.  
It doesn't seem at all reasonable to me that simple https connections would 
drive so much CPU utilization (although https uses more CPU than simple https, 
due to the SSL component).

Is this running on z/VM?  If so, how big is the guest?  How much virtual 
storage did you give it?  What does cat /proc/meminfo show you?


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-11 Thread Rob van der Heij
On 9/10/07, Kate Riggsby [EMAIL PROTECTED] wrote:

 I have been told that the load-balancer polls are an xchange of
 Hello/Server Hello packets on port 443 (not a full-blown SSL handshake)
 every 2 seconds.

Although you say there's enough real memory, it may be the system is
not configure correctly and still pages the Linux guests. Your
performance monitor should be able to provide more data than what you
mention in your post. You'd need to see whether it's indeed these two
Linux servers that consume the extra cycles, and if so, see which
processes are doing that in Linux. I would not expect that opening a
connecting on port 443 and ending it would cause a lot of CPU activity
(unless it triggers firewalls in Linux).

I think I read from your post that there's one IFL on the z800. That
means that you probably don't make things go faster by spreading the
load over multiple virtual machines (actually, you will make it
slower). The folks who came up with the model of probing port 443 may
have had a different failure model than what's applicable to running
two Linux virtual machines on the same z/VM (but I also know that such
sometimes is a nasty fight).

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


linux performance behind load balancer

2007-09-10 Thread Kate Riggsby
Greetings,

As part of our linux proof-of-concept project we built a new instance
of the servers which provide our big student services
application. The application runs on Oracle Web Application Server.
The zlinux instance is running pretty much alone on a z/800
ifl and has oodles of real memory. The application only accepts work
from 7am to midnight; the rest of the time it responds to any queries by
putting up a page listing the hours of availablity.

   The linux userid running the application was using about 3-4% of the
cpu. The day we added our instance to the (external) load balancer its
base cpu consumption went to 18% of the ifl, even during application downtime.
It does seem to be able to do its share of the work by using an additional
15% of the cpu when the application is open, but we are puzzled that
the polls by the load balancer seem to eat so much of a z/800 ifl.
The participating standalone (Dell) boxes get polled too but run
at 1% during downtime.

  Our IBM business partner is helping us investigate, but I thought
I'd ask this forum of experienced users if you've seen/conquered
performance problems running behind load balancers, or have an
opinion about how much work will fit on a z/800?

I have been told that the load-balancer polls are an xchange of
Hello/Server Hello packets on port 443 (not a full-blown SSL handshake)
every 2 seconds.

thanks,
kate

Kate Riggsby
University of Tennessee

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-10 Thread Marcy Cortes
I don't know how helpful this is

But we do run F5 loadbalancers in front of our biggest app.  There are 2
servers hitting us every 5 seconds each for HTTP and for 2 for HTTPS.
So, a total 4 hits every 5 seconds.  But it runs across 17 z9 EC IFL's
and there's never an idle time so I couldn't really tell you exactly how
much CPU that accounted for but ...  Very rough math here ... We get
about 130 TPS at 60% busy so 2 TPS is about 1% busy.  1% of 17 IFL = .17
IFL or 17% of an IFL.  If the trans were full blown -- but you said they
are not the full blown trans...

Can you take your interval from 2 seconds to a higher number and see
what happens?

(and yes, they are fat cpu intensive trans in case anyone wonders :)

You can also check in your HTTP logs how often they really do hit you.

Marcy Cortes 
 
This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation.


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Kate Riggsby
Sent: Monday, September 10, 2007 10:21 AM
To: LINUX-390@VM.MARIST.EDU
Subject: [LINUX-390] linux performance behind load balancer

Greetings,

As part of our linux proof-of-concept project we built a new instance of
the servers which provide our big student services application. The
application runs on Oracle Web Application Server.
The zlinux instance is running pretty much alone on a z/800 ifl and has
oodles of real memory. The application only accepts work from 7am to
midnight; the rest of the time it responds to any queries by putting up
a page listing the hours of availablity.

   The linux userid running the application was using about 3-4% of the
cpu. The day we added our instance to the (external) load balancer its
base cpu consumption went to 18% of the ifl, even during application
downtime.
It does seem to be able to do its share of the work by using an
additional 15% of the cpu when the application is open, but we are
puzzled that the polls by the load balancer seem to eat so much of a
z/800 ifl.
The participating standalone (Dell) boxes get polled too but run at 1%
during downtime.

  Our IBM business partner is helping us investigate, but I thought I'd
ask this forum of experienced users if you've seen/conquered performance
problems running behind load balancers, or have an opinion about how
much work will fit on a z/800?

I have been told that the load-balancer polls are an xchange of
Hello/Server Hello packets on port 443 (not a full-blown SSL handshake)
every 2 seconds.

thanks,
kate

Kate Riggsby
University of Tennessee

--
For LINUX-390 subscribe / signoff / archive access instructions, send
email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: linux performance behind load balancer

2007-09-10 Thread John Summerfield

Marcy Cortes wrote:

I don't know how helpful this is

But we do run F5 loadbalancers in front of our biggest app.  There are 2
servers hitting us every 5 seconds each for HTTP and for 2 for HTTPS.
So, a total 4 hits every 5 seconds.  But it runs across 17 z9 EC IFL's
and there's never an idle time so I couldn't really tell you exactly how
much CPU that accounted for but ...  Very rough math here ... We get
about 130 TPS at 60% busy so 2 TPS is about 1% busy.  1% of 17 IFL = .17
IFL or 17% of an IFL.  If the trans were full blown -- but you said they
are not the full blown trans...

Can you take your interval from 2 seconds to a higher number and see
what happens?

(and yes, they are fat cpu intensive trans in case anyone wonders :)

You can also check in your HTTP logs how often they really do hit you.


and also use iptables to drop the packets, and see what that does.



--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]

Please do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390