subject:"linux performance behind load balancer"

linux performance behind load balancer

2007-09-10 Thread Kate Riggsby

Greetings,

As part of our linux proof-of-concept project we built a new instance
of the servers which provide our big student services
application. The application runs on Oracle Web Application Server.
The zlinux instance is running pretty much alone on a z/800
ifl and has oodles of real memory. The application only accepts work
from 7am to midnight; the rest of the time it responds to any queries by
putting up a page listing the hours of availablity.

   The linux userid running the application was using about 3-4% of the
cpu. The day we added our instance to the (external) load balancer its
base cpu consumption went to 18% of the ifl, even during application downtime.
It does seem to be able to do its share of the work by using an additional
15% of the cpu when the application is open, but we are puzzled that
the polls by the load balancer seem to eat so much of a z/800 ifl.
The participating standalone (Dell) boxes get polled too but run
at <1% during downtime.

  Our IBM business partner is helping us investigate, but I thought
I'd ask this forum of experienced users if you've seen/conquered
performance problems running behind load balancers, or have an
opinion about how much work will fit on a z/800?

I have been told that the load-balancer polls are an xchange of
Hello/Server Hello packets on port 443 (not a full-blown SSL handshake)
every 2 seconds.

thanks,
kate

Kate Riggsby
University of Tennessee

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-10 Thread Marcy Cortes

I don't know how helpful this is

But we do run F5 loadbalancers in front of our biggest app.  There are 2
servers hitting us every 5 seconds each for HTTP and for 2 for HTTPS.
So, a total 4 hits every 5 seconds.  But it runs across 17 z9 EC IFL's
and there's never an idle time so I couldn't really tell you exactly how
much CPU that accounted for but ...  Very rough math here ... We get
about 130 TPS at 60% busy so 2 TPS is about 1% busy.  1% of 17 IFL = .17
IFL or 17% of an IFL.  If the trans were full blown -- but you said they
are not the full blown trans...

Can you take your interval from 2 seconds to a higher number and see
what happens?

(and yes, they are fat cpu intensive trans in case anyone wonders :)

You can also check in your HTTP logs how often they really do hit you.

Marcy Cortes 
 
"This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation."


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Kate Riggsby
Sent: Monday, September 10, 2007 10:21 AM
To: LINUX-390@VM.MARIST.EDU
Subject: [LINUX-390] linux performance behind load balancer

Greetings,

As part of our linux proof-of-concept project we built a new instance of
the servers which provide our big student services application. The
application runs on Oracle Web Application Server.
The zlinux instance is running pretty much alone on a z/800 ifl and has
oodles of real memory. The application only accepts work from 7am to
midnight; the rest of the time it responds to any queries by putting up
a page listing the hours of availablity.

   The linux userid running the application was using about 3-4% of the
cpu. The day we added our instance to the (external) load balancer its
base cpu consumption went to 18% of the ifl, even during application
downtime.
It does seem to be able to do its share of the work by using an
additional 15% of the cpu when the application is open, but we are
puzzled that the polls by the load balancer seem to eat so much of a
z/800 ifl.
The participating standalone (Dell) boxes get polled too but run at <1%
during downtime.

  Our IBM business partner is helping us investigate, but I thought I'd
ask this forum of experienced users if you've seen/conquered performance
problems running behind load balancers, or have an opinion about how
much work will fit on a z/800?

I have been told that the load-balancer polls are an xchange of
Hello/Server Hello packets on port 443 (not a full-blown SSL handshake)
every 2 seconds.

thanks,
kate

Kate Riggsby
University of Tennessee

--
For LINUX-390 subscribe / signoff / archive access instructions, send
email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-10 Thread John Summerfield


Marcy Cortes wrote:

I don't know how helpful this is

But we do run F5 loadbalancers in front of our biggest app.  There are 2
servers hitting us every 5 seconds each for HTTP and for 2 for HTTPS.
So, a total 4 hits every 5 seconds.  But it runs across 17 z9 EC IFL's
and there's never an idle time so I couldn't really tell you exactly how
much CPU that accounted for but ...  Very rough math here ... We get
about 130 TPS at 60% busy so 2 TPS is about 1% busy.  1% of 17 IFL = .17
IFL or 17% of an IFL.  If the trans were full blown -- but you said they
are not the full blown trans...

Can you take your interval from 2 seconds to a higher number and see
what happens?

(and yes, they are fat cpu intensive trans in case anyone wonders :)

You can also check in your HTTP logs how often they really do hit you.


and also use iptables to drop the packets, and see what that does.



--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]

Please do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-11 Thread Rob van der Heij

On 9/10/07, Kate Riggsby <[EMAIL PROTECTED]> wrote:

> I have been told that the load-balancer polls are an xchange of
> Hello/Server Hello packets on port 443 (not a full-blown SSL handshake)
> every 2 seconds.

Although you say there's enough real memory, it may be the system is
not configure correctly and still pages the Linux guests. Your
performance monitor should be able to provide more data than what you
mention in your post. You'd need to see whether it's indeed these two
Linux servers that consume the extra cycles, and if so, see which
processes are doing that in Linux. I would not expect that opening a
connecting on port 443 and ending it would cause a lot of CPU activity
(unless it triggers firewalls in Linux).

I think I read from your post that there's one IFL on the z800. That
means that you probably don't make things go faster by spreading the
load over multiple virtual machines (actually, you will make it
slower). The folks who came up with the model of probing port 443 may
have had a different failure model than what's applicable to running
two Linux virtual machines on the same z/VM (but I also know that such
sometimes is a nasty fight).

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-12 Thread Mark Post

>>> On Mon, Sep 10, 2007 at  1:20 PM, in message
<[EMAIL PROTECTED]>, Kate Riggsby <[EMAIL PROTECTED]>
wrote: 
-snip-
>The linux userid running the application was using about 3-4% of the
> cpu. The day we added our instance to the (external) load balancer its
> base cpu consumption went to 18% of the ifl, even during application 
> downtime.

Kate,

What distribution (and version) are you running?  What performance monitor are 
you using?

As Rob said, much more information is needed to even start to figure this out.  
It doesn't seem at all reasonable to me that simple https connections would 
drive so much CPU utilization (although https uses more CPU than simple https, 
due to the SSL component).

Is this running on z/VM?  If so, how big is the guest?  How much virtual 
storage did you give it?  What does "cat /proc/meminfo" show you?

Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread Kate Riggsby

Thank you all for sharing experiences and for advice. It gives me
hope there may be a way around my brick wall!

Rob, page-in is what this problem feels like. But
the lpar has 6G/2G of main/xstor and the total virtual storage
of all the guests together is 3264M, of which the problem linux guest
has 2G.  VM reports 0 paging.

Just to clarify, there is one ifl on the z800. The lpar is running the
regular VM service machines, two small non-load-balanced linuxes
and then this problem instance. The linux I'm talking about
is load-balanced with five standalone (not on VM)
boxes. Our vm linux instance shouldn't be running a firewall
but it's another thing I should verify.

Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are
running SLES9 SP3 (64bit). The system is using Performance Toolkit.
cat /proc/meminfo shows:
   MemTotal:  2050128 kB   LowFree:343412 kB
   MemFree:343412 kB   SwapTotal:  475852 kB
   Buffers:143588 kB   SwapFree:   475852 kB
   Cached:1128868 kB   Dirty: 444 kB
   SwapCached:  0 kB   Writeback:   0 kB
   Active:1035208 kB   Mapped: 279544 kB
   Inactive:   486132 kB   Slab:   163204 kB
   HighTotal:   0 kB   Committed_AS:  2997492 kB
   HighFree:0 kB   PageTables:   2496 kB
   LowTotal:  2050128 kB   VmallocTotal: 4292861952 kB
   VmallocUsed:  2532 kB
   VmallocChunk: 4292859180 kB
thanks,
kate

On 9/11/07, Rob van der Heij wrote:

>Although you say there's enough real memory, it may be the system is
>not configured correctly and still pages the Linux guests. Your
>performance monitor should be able to provide more data than what you
>mention in your post. You'd need to see whether it's indeed these two
>Linux servers that consume the extra cycles, and if so, see which
>processes are doing that in Linux. I would not expect that opening a
>connecting on port 443 and ending it would cause a lot of CPU activity
>(unless it triggers firewalls in Linux).
>
>I think I read from your post that there's one IFL on the z800. That
>means that you probably don't make things go faster by spreading the
>load over multiple virtual machines (actually, you will make it
>slower). The folks who came up with the model of probing port 443 may
>have had a different failure model than what's applicable to running
>two Linux virtual machines on the same z/VM (but I also know that such
>sometimes is a nasty fight).
>
>Rob
>--
>Rob van der Heij

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread barton


A decent performance monitor (ESALPS comes to mind) will tell you exactly what 
processes
are using the cpu and exactly how much. Have you considered running a decent 
performance
monitor?



Kate Riggsby wrote:


Thank you all for sharing experiences and for advice. It gives me
hope there may be a way around my brick wall!

Rob, page-in is what this problem feels like. But
the lpar has 6G/2G of main/xstor and the total virtual storage
of all the guests together is 3264M, of which the problem linux guest
has 2G.  VM reports 0 paging.

Just to clarify, there is one ifl on the z800. The lpar is running the
regular VM service machines, two small non-load-balanced linuxes
and then this problem instance. The linux I'm talking about
is load-balanced with five standalone (not on VM)
boxes. Our vm linux instance shouldn't be running a firewall
but it's another thing I should verify.

Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are
running SLES9 SP3 (64bit). The system is using Performance Toolkit.
cat /proc/meminfo shows:
   MemTotal:  2050128 kB   LowFree:343412 kB
   MemFree:343412 kB   SwapTotal:  475852 kB
   Buffers:143588 kB   SwapFree:   475852 kB
   Cached:1128868 kB   Dirty: 444 kB
   SwapCached:  0 kB   Writeback:   0 kB
   Active:1035208 kB   Mapped: 279544 kB
   Inactive:   486132 kB   Slab:   163204 kB
   HighTotal:   0 kB   Committed_AS:  2997492 kB
   HighFree:0 kB   PageTables:   2496 kB
   LowTotal:  2050128 kB   VmallocTotal: 4292861952 kB
   VmallocUsed:  2532 kB
   VmallocChunk: 4292859180 kB
thanks,
kate

On 9/11/07, Rob van der Heij wrote:



Although you say there's enough real memory, it may be the system is
not configured correctly and still pages the Linux guests. Your
performance monitor should be able to provide more data than what you
mention in your post. You'd need to see whether it's indeed these two
Linux servers that consume the extra cycles, and if so, see which
processes are doing that in Linux. I would not expect that opening a
connecting on port 443 and ending it would cause a lot of CPU activity
(unless it triggers firewalls in Linux).

I think I read from your post that there's one IFL on the z800. That
means that you probably don't make things go faster by spreading the
load over multiple virtual machines (actually, you will make it
slower). The folks who came up with the model of probing port 443 may
have had a different failure model than what's applicable to running
two Linux virtual machines on the same z/VM (but I also know that such
sometimes is a nasty fight).

Rob
--
Rob van der Heij



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390






--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
begin:vcard
fn:Barton Robinson
n:Robinson;Barton
adr;dom:;;PO 390640;Mountain View;CA;94039-0640
email;internet:[EMAIL PROTECTED]
title:Sr. Architect
tel;work:650-964-8867
note:If you can't measure it, I'm just not interested
x-mozilla-html:FALSE
url:http://velocitysoftware.com
version:2.1
end:vcard

Re: linux performance behind load balancer

2007-09-13 Thread Alan Altmark

On Thursday, 09/13/2007 at 10:22 EDT, barton
<[EMAIL PROTECTED]> wrote:
> A decent performance monitor (ESALPS comes to mind) will tell you
exactly what
> processes
> are using the cpu and exactly how much. Have you considered running a
decent
> performance
> monitor?

Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's more
than one "decent" performance monitor Out There, so shop and compare.

Real point:  Successful z/VM+Linux deployments include, among other
things, tools that can monitor resource consumption of the box, your LPAR,
and your Linux guests.  But as has been noted, while that function is
necessary, it is not, by any reasonable measure, sufficient.  You must
also be able to correlate that with information on what's going on
*inside* the guest.

IMO, they should be part of POCs, too.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread Marcy Cortes

As well as inside your App Server if you are using one of those.  Easy to
create bad java or misconfigured WAS :).  Introscope and ITCAM are 2
examples of those.

Marcy Cortes

"This message may contain confidential and/or privileged information. If you
are not the addressee or authorized to receive this for the addressee, you
must not use, copy, disclose, or take any action based on this message or
any information herein. If you have received this message in error, please
advise the sender immediately by reply e-mail and delete this message. Thank
you for your cooperation."

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Alan
Altmark
Sent: Thursday, September 13, 2007 7:56 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer

On Thursday, 09/13/2007 at 10:22 EDT, barton
<[EMAIL PROTECTED]> wrote:
> A decent performance monitor (ESALPS comes to mind) will tell you
exactly what
> processes
> are using the cpu and exactly how much. Have you considered running a
decent
> performance
> monitor?

Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's more
than one "decent" performance monitor Out There, so shop and compare.

Real point:  Successful z/VM+Linux deployments include, among other things,
tools that can monitor resource consumption of the box, your LPAR, and your
Linux guests.  But as has been noted, while that function is necessary, it
is not, by any reasonable measure, sufficient.  You must also be able to
correlate that with information on what's going on
*inside* the guest.

IMO, they should be part of POCs, too.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions, send email
to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

smime.p7s
Description: S/MIME cryptographic signature

Re: linux performance behind load balancer

2007-09-13 Thread Rob van der Heij

On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote:

> Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's more
> than one "decent" performance monitor Out There, so shop and compare.

But since that will present incorrect CPU breakdown per Linux process,
it may lead to wrong conclusions. ESALPS will correct the CPU usage
for virtualization effects.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread Alan Altmark

On Thursday, 09/13/2007 at 04:07 EDT, Rob van der Heij
<[EMAIL PROTECTED]> wrote:
> On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote:
>
> > Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's
more
> > than one "decent" performance monitor Out There, so shop and compare.
>
> But since that will present incorrect CPU breakdown per Linux process,
> it may lead to wrong conclusions. ESALPS will correct the CPU usage
> for virtualization effects.

SLES 10 and RHEL 5 correct for the virtualization effects and OMEGAMON
gives the "normalized" numbers.

But the importance of that depends on what you want to know, doesn't it?
If you're interested in which Linux process is hogging the guest, the
absolute number is irrelevant.

It *is* true that for capacity planning and chargeback you need a better
absolute number than what older distros provide.  As a result, we decided
to update OMEGAMON to normalize those numbers for guests that don't
generate the more accurate data.  We hope to deliver that late this year
or early next.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread Mark Post

>>> On Thu, Sep 13, 2007 at  9:38 AM, in message
<[EMAIL PROTECTED]>, Kate Riggsby <[EMAIL PROTECTED]>
wrote: 
-snip-
> Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are
> running SLES9 SP3 (64bit). The system is using Performance Toolkit.
> cat /proc/meminfo shows:
>MemTotal:  2050128 kB   LowFree:343412 kB
>MemFree:343412 kB   SwapTotal:  475852 kB
>Buffers:143588 kB   SwapFree:   475852 kB
>Cached:1128868 kB   Dirty: 444 kB
>SwapCached:  0 kB   Writeback:   0 kB
>Active:1035208 kB   Mapped: 279544 kB
>Inactive:   486132 kB   Slab:   163204 kB
>HighTotal:   0 kB   Committed_AS:  2997492 kB
>HighFree:0 kB   PageTables:   2496 kB
>LowTotal:  2050128 kB   VmallocTotal: 4292861952 kB
>VmallocUsed:  2532 kB
>VmallocChunk: 4292859180 kB

As lots of other people have already said (and with whom I agree), you really 
need some sort of performance monitor to figure out problems such as this.  One 
thing I will say regardless of that, is that it looks like you have almost half 
a gig of inactive pages in your guest.  That means that you can likely reduce 
your guest size by about that much (subject to subsequent measurement of the 
results), and see if your Linux paging rates go up a whole bunch.  The fact 
that your system is using about 1GB of storage for caching tells me that likely 
they won't.  (And, don't confuse page space in use with paging I/O _rates_.  
The first is OK, the second needs to be watched closely.)

Doing this is not likely to have an impact on your immediate problem, but 
keeping your guest sizes as small as possible is a good habit to get into now.

Second, since this is a brand new install, why are you using SLES9, and not 
SLES10?  ISV certifications and such is a perfectly good reason, of course, but 
if there's nothing like that standing in the way, I would recommend using 
SLES10 SP1.

Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread Mark Post

>>> On Thu, Sep 13, 2007 at  9:45 PM, in message
<[EMAIL PROTECTED]>, Alan
Altmark <[EMAIL PROTECTED]> wrote: 
-snip-
> SLES 10 and RHEL 5 correct for the virtualization effects and OMEGAMON
> gives the "normalized" numbers.

Unfortunately in this case, SLES9 is being used, so the incorrect performance 
data is a problem.  (One of the fairly numerous reasons I would recommend using 
SLES10 SP1.)

> But the importance of that depends on what you want to know, doesn't it?
> If you're interested in which Linux process is hogging the guest, the
> absolute number is irrelevant.

With the way things were designed in the 2.4 and earlier 2.6 kernels, that 
might be somewhat suspect as well.  I've seen cases where the "bad guy" process 
wasn't easy to find.

Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-13 Thread Rob van der Heij

On 9/14/07, Alan Altmark <[EMAIL PROTECTED]> wrote:

> But the importance of that depends on what you want to know, doesn't it?
> If you're interested in which Linux process is hogging the guest, the
> absolute number is irrelevant.

But if you're comparing usage before and after some configuration
change, it does become important. Simply the fact that you generate
load in other virtual machines that were idle before, Linux will think
the real business process takes more CPU resources per transaction
than before. And Linux tools will tell you so, even though it is not
true. That's what may make you bark up the wrong tree.

But you *are* very right that performance monitor should be part of
your Proof of Concept. We don't see a PoC fail these days because
software does not work or cannot be found. We see it fail because
people suffer from poor performance and have nobody to turn to for
help in that new environment. So they get folks with Linux skills on
discrete servers and they do all the wrong things because tuning with
Linux on z/VM is rarely intuitive. Or folks using the wrong tools to
measure and draw the wrong conclusions about the capacity of their
installation and their TCO if they grow it.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread Evans, Kevin R

Rob,

As we are just switching to Omegamon and almost up to implementation of
our first user to come into a new zLinux front end, can you give ant
further details on your comment below?

Thanks

Kevin

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Rob van der Heij
Sent: Thursday, September 13, 2007 4:07 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: linux performance behind load balancer

On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote:

> Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's
more
> than one "decent" performance monitor Out There, so shop and compare.

But since that will present incorrect CPU breakdown per Linux process,
it may lead to wrong conclusions. ESALPS will correct the CPU usage
for virtualization effects.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread Alan Altmark

On Friday, 09/14/2007 at 02:19 EDT, Rob van der Heij
<[EMAIL PROTECTED]> wrote:
> But you *are* very right that performance monitor should be part of
> your Proof of Concept. We don't see a PoC fail these days because
> software does not work or cannot be found. We see it fail because
> people suffer from poor performance and have nobody to turn to for
> help in that new environment. So they get folks with Linux skills on
> discrete servers and they do all the wrong things because tuning with
> Linux on z/VM is rarely intuitive. Or folks using the wrong tools to
> measure and draw the wrong conclusions about the capacity of their
> installation and their TCO if they grow it.

Amen.

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread Mark Post

>>> On Fri, Sep 14, 2007 at  6:48 AM, in message
<[EMAIL PROTECTED]>, "Evans, Kevin
R" <[EMAIL PROTECTED]> wrote: 
> Rob,
> 
> As we are just switching to Omegamon and almost up to implementation of
> our first user to come into a new zLinux front end, can you give ant
> further details on your comment below?

Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption was 
tracked by the Linux kernel didn't take into account that the system may be 
running in a shared/virtualized environment.  The (valid until LPARs, z/VM, 
VMware, and Xen) assumption in place was that the kernel was in complete 
control of the hardware, so any passage of time between the last clock value 
taken, and the current one, was assigned to whatever process was dispatched in 
the interval.  The problem being, of course, that the virtual machine/LPAR 
might not have been running at all during that time.  So, Linux could report 
that the CPU was 100% busy, when in fact it was only being dispatched, for 
example, 3% of the time.

Of the various performance monitors that were being marketed for mainframe 
Linux, only Velocity Software's product combined the Linux data with the z/VM 
monitor data, and normalized the Linux values to be correct.  (Obviously this 
only worked in an environment where z/VM was being used as the hypervisor.)  
This was a big factor in many cases of which monitor to choose.  Since the 
release of the "cpu accounting" patches, and incorporation into SLES and RHEL, 
that's no longer the case, unless you're still running SLES8 (Hi, Marcy!) and 
SLES9 (Hi, almost everyone else!), or RHEL3 or 4.  Now the decision is based on 
more traditional criteria, as opposed to being right or very wrong.

If you have a userid and password to access the SHARE proceedings, you can see 
Martin Schwidefsky's presentation on this at
http://www.share.org/member_center/open_document.cfm?document=proceedings/SHARE_in_Seattle/S9266XX172938.pdf

(I have no idea why I didn't ask Martin for a copy of that for the linuxvm.org 
web site.  Rats.)

Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread Marcy Cortes

(Hi Mark!)

That's the disadvantage of starting before everyone else and having too
many servers :)
At least I've killed the sles7's!

The problem with sles8 to sles9x is it's a new server.  That requires
the cooperation of the users.  They don't like to do that if everything
is all hunky dory.  They have other things to do (so they tell me).

I'm hoping sles9x to sles10x is a true upgrade and we can do it without
bothering the applications folks.  That's a project to figure out over
the holiday freeze, though.

I'm pretty sure all of production will be sles9x within the next 2
months - woo hoo!  The promise of better performance from WAS6.1 and
sles9x saving them a few IFLs is finally getting their attention.

(see you next week).

Marcy Cortes

"This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation."

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Mark Post
Sent: Friday, September 14, 2007 9:20 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer

>>> On Fri, Sep 14, 2007 at  6:48 AM, in message
<[EMAIL PROTECTED]>,
"Evans, Kevin R" <[EMAIL PROTECTED]> wrote: 
> Rob,
> 
> As we are just switching to Omegamon and almost up to implementation 
> of our first user to come into a new zLinux front end, can you give 
> ant further details on your comment below?

Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption
was tracked by the Linux kernel didn't take into account that the system
may be running in a shared/virtualized environment.  The (valid until
LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel
was in complete control of the hardware, so any passage of time between
the last clock value taken, and the current one, was assigned to
whatever process was dispatched in the interval.  The problem being, of
course, that the virtual machine/LPAR might not have been running at all
during that time.  So, Linux could report that the CPU was 100% busy,
when in fact it was only being dispatched, for example, 3% of the time.

Of the various performance monitors that were being marketed for
mainframe Linux, only Velocity Software's product combined the Linux
data with the z/VM monitor data, and normalized the Linux values to be
correct.  (Obviously this only worked in an environment where z/VM was
being used as the hypervisor.)  This was a big factor in many cases of
which monitor to choose.  Since the release of the "cpu accounting"
patches, and incorporation into SLES and RHEL, that's no longer the
case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi,
almost everyone else!), or RHEL3 or 4.  Now the decision is based on
more traditional criteria, as opposed to being right or very wrong.

If you have a userid and password to access the SHARE proceedings, you
can see Martin Schwidefsky's presentation on this at
http://www.share.org/member_center/open_document.cfm?document=proceeding
s/SHARE_in_Seattle/S9266XX172938.pdf

(I have no idea why I didn't ask Martin for a copy of that for the
linuxvm.org web site.  Rats.)

Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions, send
email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread barton


Forgive the soap box. This is old news.  Linux process data in any virtual 
environment is
wrong.  This was measured and presented in a production environment as off by 
order of
magnitude. This is true for all releases and distributions of linux.  ibm 
claims there is
a fix in sles10, this has never been validated in any presentation. i don't 
like claims
that sound suspiciously like vaporware.

Why is this data bad? it's useless.
1) imagine someone doing application tuning and using this data and thinking 
they've
improved their app performance - and their data is wrong, leads to wrong 
conclusion
2) system utilization high, logon to any linux using linux tools, you might 
think top is
the hog, but if the system utilization is high, you will make very poor choices 
of what
processes or linux server to kill
3) if you are making poc decisions based on this data, you will think the 
mainframe is dog
slow. this is bad for all of us and leads to poor financial decisions.

This gives a very good platform a bad image.

6 years ago when velocity software analyzed this, we found a way to correct and 
record the
process data. (for all linux releases and distributions) thus this data is 
useful for all
of the above. no other vendor (other than velocity software) has presented a 
solution for
this problem - and validated in any public forum.

So when i hear about installations planning on dependancies on garbage data, i 
think about
 how many people would drive cars without gas gauges or speedometers? was the 
used car
salesperson ethical in taking advantage of the naive buyer?






Evans, Kevin R wrote:

Rob,

As we are just switching to Omegamon and almost up to implementation of
our first user to come into a new zLinux front end, can you give ant
further details on your comment below?

Thanks

Kevin

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Rob van der Heij
Sent: Thursday, September 13, 2007 4:07 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: linux performance behind load balancer

On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote:



Finishing the thought, IBM's OMEGAMON comes to mind as well.  There's


more


than one "decent" performance monitor Out There, so shop and compare.



But since that will present incorrect CPU breakdown per Linux process,
it may lead to wrong conclusions. ESALPS will correct the CPU usage
for virtualization effects.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
begin:vcard
fn:Barton Robinson
n:Robinson;Barton
adr;dom:;;PO 390640;Mountain View;CA;94039-0640
email;internet:[EMAIL PROTECTED]
title:Sr. Architect
tel;work:650-964-8867
note:If you can't measure it, I'm just not interested
x-mozilla-html:FALSE
url:http://velocitysoftware.com
version:2.1
end:vcard

Re: linux performance behind load balancer

2007-09-14 Thread barton


There are some issues with WAS right now that seriously impact Linux under 
z/VM.  Rob's
out of town, he can explain  better.  The problem is that the current JDK polls 
every
10ms.  this means the WAS servers stay in queue. We have been seeing the total 
to virtual
storage over allocation ratios that sites can attain have been dropping, traced 
it down to
servers not dropping from queue. Rob tracked it down to the WAS polling. We're 
hoping for
relief next year. So be careful about the performance feecher of 6.1.



Marcy Cortes wrote:


(Hi Mark!)

I'm pretty sure all of production will be sles9x within the next 2
months - woo hoo!  The promise of better performance from WAS6.1 and
sles9x saving them a few IFLs is finally getting their attention.

(see you next week).

Marcy Cortes


-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Mark Post
Sent: Friday, September 14, 2007 9:20 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer



On Fri, Sep 14, 2007 at  6:48 AM, in message


<[EMAIL PROTECTED]>,
"Evans, Kevin R" <[EMAIL PROTECTED]> wrote:


Rob,

As we are just switching to Omegamon and almost up to implementation
of our first user to come into a new zLinux front end, can you give
ant further details on your comment below?



Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption
was tracked by the Linux kernel didn't take into account that the system
may be running in a shared/virtualized environment.  The (valid until
LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel
was in complete control of the hardware, so any passage of time between
the last clock value taken, and the current one, was assigned to
whatever process was dispatched in the interval.  The problem being, of
course, that the virtual machine/LPAR might not have been running at all
during that time.  So, Linux could report that the CPU was 100% busy,
when in fact it was only being dispatched, for example, 3% of the time.

Of the various performance monitors that were being marketed for
mainframe Linux, only Velocity Software's product combined the Linux
data with the z/VM monitor data, and normalized the Linux values to be
correct.  (Obviously this only worked in an environment where z/VM was
being used as the hypervisor.)  This was a big factor in many cases of
which monitor to choose.  Since the release of the "cpu accounting"
patches, and incorporation into SLES and RHEL, that's no longer the
case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi,
almost everyone else!), or RHEL3 or 4.  Now the decision is based on
more traditional criteria, as opposed to being right or very wrong.

If you have a userid and password to access the SHARE proceedings, you
can see Martin Schwidefsky's presentation on this at
http://www.share.org/member_center/open_document.cfm?document=proceeding
s/SHARE_in_Seattle/S9266XX172938.pdf

(I have no idea why I didn't ask Martin for a copy of that for the
linuxvm.org web site.  Rats.)


Mark Post



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
begin:vcard
fn:Barton Robinson
n:Robinson;Barton
adr;dom:;;PO 390640;Mountain View;CA;94039-0640
email;internet:[EMAIL PROTECTED]
title:Sr. Architect
tel;work:650-964-8867
note:If you can't measure it, I'm just not interested
x-mozilla-html:FALSE
url:http://velocitysoftware.com
version:2.1
end:vcard

Re: linux performance behind load balancer

2007-09-14 Thread Mark Post

>>> On Fri, Sep 14, 2007 at  4:11 PM, in message
<[EMAIL PROTECTED]>, barton
<[EMAIL PROTECTED]> wrote: 
> There are some issues with WAS right now that seriously impact Linux under 
> z/VM.  Rob's
> out of town, he can explain  better.  The problem is that the current JDK 
> polls every
> 10ms.  this means the WAS servers stay in queue. We have been seeing the 
> total to virtual
> storage over allocation ratios that sites can attain have been dropping, 
> traced it down to
> servers not dropping from queue. Rob tracked it down to the WAS polling. 
> We're hoping for
> relief next year. So be careful about the performance feecher of 6.1.

It was Rob working with me on the Linux/390 wiki system that led him to the 
discovery that the IBM JDK was issuing 10ms sleeps.  It wasn't just in the 
newer versions of the JDK, it was in the 1.4.2 ones as well.  So, upgrading to 
a newer version of WAS and it's associated Java, shouldn't be any worse in that 
regard.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread Marcy Cortes

Production I'm not so worried about because it has adequate capacity (no
paging) and the servers run all the time anyway so don't drop from queue
because of real work.  They've benchmarked and measured (with Velocity
tools :) the differences betweens sles9x/was6 and sles8/was5 and see
significant differences.  We'll let you know for sure next week with the
real workload going through.

But this might explain the increased paging load on our test system,
which already was bursting at the seams.

Do you know what level of the JDK that started in? 

Marcy Cortes 

"This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation."

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
barton
Sent: Friday, September 14, 2007 1:12 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] linux performance behind load balancer

There are some issues with WAS right now that seriously impact Linux
under z/VM.  Rob's out of town, he can explain  better.  The problem is
that the current JDK polls every 10ms.  this means the WAS servers stay
in queue. We have been seeing the total to virtual storage over
allocation ratios that sites can attain have been dropping, traced it
down to servers not dropping from queue. Rob tracked it down to the WAS
polling. We're hoping for relief next year. So be careful about the
performance feecher of 6.1.

Marcy Cortes wrote:

> (Hi Mark!)
>
> I'm pretty sure all of production will be sles9x within the next 2 
> months - woo hoo!  The promise of better performance from WAS6.1 and 
> sles9x saving them a few IFLs is finally getting their attention.
>
> (see you next week).
>
> Marcy Cortes
>
>
> -Original Message-
> From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of 
> Mark Post
> Sent: Friday, September 14, 2007 9:20 AM
> To: LINUX-390@VM.MARIST.EDU
> Subject: Re: [LINUX-390] linux performance behind load balancer
>
>
>>>>On Fri, Sep 14, 2007 at  6:48 AM, in message
>
> <[EMAIL PROTECTED]>,
> "Evans, Kevin R" <[EMAIL PROTECTED]> wrote:
>
>>Rob,
>>
>>As we are just switching to Omegamon and almost up to implementation 
>>of our first user to come into a new zLinux front end, can you give 
>>ant further details on your comment below?
>
>
> Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption

> was tracked by the Linux kernel didn't take into account that the 
> system may be running in a shared/virtualized environment.  The (valid

> until LPARs, z/VM, VMware, and Xen) assumption in place was that the 
> kernel was in complete control of the hardware, so any passage of time

> between the last clock value taken, and the current one, was assigned 
> to whatever process was dispatched in the interval.  The problem 
> being, of course, that the virtual machine/LPAR might not have been 
> running at all during that time.  So, Linux could report that the CPU 
> was 100% busy, when in fact it was only being dispatched, for example,
3% of the time.
>
> Of the various performance monitors that were being marketed for 
> mainframe Linux, only Velocity Software's product combined the Linux 
> data with the z/VM monitor data, and normalized the Linux values to be

> correct.  (Obviously this only worked in an environment where z/VM was

> being used as the hypervisor.)  This was a big factor in many cases of

> which monitor to choose.  Since the release of the "cpu accounting"
> patches, and incorporation into SLES and RHEL, that's no longer the 
> case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi, 
> almost everyone else!), or RHEL3 or 4.  Now the decision is based on 
> more traditional criteria, as opposed to being right or very wrong.
>
> If you have a userid and password to access the SHARE proceedings, you

> can see Martin Schwidefsky's presentation on this at 
> http://www.share.org/member_center/open_document.cfm?document=proceedi
> ng
> s/SHARE_in_Seattle/S9266XX172938.pdf
>
> (I have no idea why I didn't ask Martin for a copy of that for the 
> linuxvm.org web site.  Rats.)
>
>
> Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions, send
email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread Mark Post

>>> On Fri, Sep 14, 2007 at  4:57 PM, in message
<[EMAIL PROTECTED]>, Marcy
Cortes <[EMAIL PROTECTED]> wrote: 
-snip-
> Do you know what level of the JDK that started in? 

It's been around for a while.  The version I was running at the time Rob first 
investigated was an older 1.4.2 release.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-14 Thread John Summerfield


barton wrote:

Forgive the soap box. This is old news.  Linux process data in any
virtual environment is
wrong.  This was measured and presented in a production environment as
off by order of
magnitude. This is true for all releases and distributions of linux.
ibm claims there is
a fix in sles10, this has never been validated in any presentation. i
don't like claims
that sound suspiciously like vaporware.


Your own comments don't sound any better, to me. Are you claiming that
the accounting patches mentioned elsewhere don't work as claimed? Can
you support that claim?




--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]

Please do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: linux performance behind load balancer

2007-09-16 Thread Rob van der Heij

On 9/14/07, Mark Post <[EMAIL PROTECTED]> wrote:

> It was Rob working with me on the Linux/390 wiki system that led him to the 
> discovery that the IBM JDK was issuing 10ms sleeps.  It wasn't just in the 
> newer versions of the JDK, it was in the 1.4.2 ones as well.  So, upgrading 
> to a newer version of WAS and it's associated Java, shouldn't be any worse in 
> that regard.

He's back now... ;-)  There are issues in some later JDK levels (not
yet 1.4.2) and Velocity Software have a bypass for it - IBM is working
on getting the bypass supported and eventually fix the problem.
And there are problems in applications. In the Open Source Java
application Mark and I looked at, I could identify the cause and
repair it. With later versions of WAS (after 4.5) there are similar
problems. And with closed source like WAS I can only identify it and
yell at developers who may listen but do not understand.

Because each Java application seems to ship with its own copy of the
JVM, it's not always easy to understand whether its the JVM or the
application. But it may not be relevant either where the problem is,
unless you're in a position to fix it...

Oh, and part of IBM's DB2 also has such problems, even though it's not in Java.

IMHO the cause is a skills problem with the developers. Poorly written
code they can get away with on discrete servers, that is slightly
noticed on simple virtualization like VMware, but impacts scalability
when you get as advanced as we can do with z/VM.

This is not easy stuff. Although Barton says people don't want to hear
difficult explanations, I may need to sit down and write about it.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

Re: linux performance behind load balancer

25 matches

Site Navigation

Mail list logo

Footer information