linux performance behind load balancer
Greetings, As part of our linux proof-of-concept project we built a new instance of the servers which provide our big student services application. The application runs on Oracle Web Application Server. The zlinux instance is running pretty much alone on a z/800 ifl and has oodles of real memory. The application only accepts work from 7am to midnight; the rest of the time it responds to any queries by putting up a page listing the hours of availablity. The linux userid running the application was using about 3-4% of the cpu. The day we added our instance to the (external) load balancer its base cpu consumption went to 18% of the ifl, even during application downtime. It does seem to be able to do its share of the work by using an additional 15% of the cpu when the application is open, but we are puzzled that the polls by the load balancer seem to eat so much of a z/800 ifl. The participating standalone (Dell) boxes get polled too but run at <1% during downtime. Our IBM business partner is helping us investigate, but I thought I'd ask this forum of experienced users if you've seen/conquered performance problems running behind load balancers, or have an opinion about how much work will fit on a z/800? I have been told that the load-balancer polls are an xchange of Hello/Server Hello packets on port 443 (not a full-blown SSL handshake) every 2 seconds. thanks, kate Kate Riggsby University of Tennessee -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
I don't know how helpful this is But we do run F5 loadbalancers in front of our biggest app. There are 2 servers hitting us every 5 seconds each for HTTP and for 2 for HTTPS. So, a total 4 hits every 5 seconds. But it runs across 17 z9 EC IFL's and there's never an idle time so I couldn't really tell you exactly how much CPU that accounted for but ... Very rough math here ... We get about 130 TPS at 60% busy so 2 TPS is about 1% busy. 1% of 17 IFL = .17 IFL or 17% of an IFL. If the trans were full blown -- but you said they are not the full blown trans... Can you take your interval from 2 seconds to a higher number and see what happens? (and yes, they are fat cpu intensive trans in case anyone wonders :) You can also check in your HTTP logs how often they really do hit you. Marcy Cortes "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Kate Riggsby Sent: Monday, September 10, 2007 10:21 AM To: LINUX-390@VM.MARIST.EDU Subject: [LINUX-390] linux performance behind load balancer Greetings, As part of our linux proof-of-concept project we built a new instance of the servers which provide our big student services application. The application runs on Oracle Web Application Server. The zlinux instance is running pretty much alone on a z/800 ifl and has oodles of real memory. The application only accepts work from 7am to midnight; the rest of the time it responds to any queries by putting up a page listing the hours of availablity. The linux userid running the application was using about 3-4% of the cpu. The day we added our instance to the (external) load balancer its base cpu consumption went to 18% of the ifl, even during application downtime. It does seem to be able to do its share of the work by using an additional 15% of the cpu when the application is open, but we are puzzled that the polls by the load balancer seem to eat so much of a z/800 ifl. The participating standalone (Dell) boxes get polled too but run at <1% during downtime. Our IBM business partner is helping us investigate, but I thought I'd ask this forum of experienced users if you've seen/conquered performance problems running behind load balancers, or have an opinion about how much work will fit on a z/800? I have been told that the load-balancer polls are an xchange of Hello/Server Hello packets on port 443 (not a full-blown SSL handshake) every 2 seconds. thanks, kate Kate Riggsby University of Tennessee -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
Marcy Cortes wrote: I don't know how helpful this is But we do run F5 loadbalancers in front of our biggest app. There are 2 servers hitting us every 5 seconds each for HTTP and for 2 for HTTPS. So, a total 4 hits every 5 seconds. But it runs across 17 z9 EC IFL's and there's never an idle time so I couldn't really tell you exactly how much CPU that accounted for but ... Very rough math here ... We get about 130 TPS at 60% busy so 2 TPS is about 1% busy. 1% of 17 IFL = .17 IFL or 17% of an IFL. If the trans were full blown -- but you said they are not the full blown trans... Can you take your interval from 2 seconds to a higher number and see what happens? (and yes, they are fat cpu intensive trans in case anyone wonders :) You can also check in your HTTP logs how often they really do hit you. and also use iptables to drop the packets, and see what that does. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Please do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
On 9/10/07, Kate Riggsby <[EMAIL PROTECTED]> wrote: > I have been told that the load-balancer polls are an xchange of > Hello/Server Hello packets on port 443 (not a full-blown SSL handshake) > every 2 seconds. Although you say there's enough real memory, it may be the system is not configure correctly and still pages the Linux guests. Your performance monitor should be able to provide more data than what you mention in your post. You'd need to see whether it's indeed these two Linux servers that consume the extra cycles, and if so, see which processes are doing that in Linux. I would not expect that opening a connecting on port 443 and ending it would cause a lot of CPU activity (unless it triggers firewalls in Linux). I think I read from your post that there's one IFL on the z800. That means that you probably don't make things go faster by spreading the load over multiple virtual machines (actually, you will make it slower). The folks who came up with the model of probing port 443 may have had a different failure model than what's applicable to running two Linux virtual machines on the same z/VM (but I also know that such sometimes is a nasty fight). Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
>>> On Mon, Sep 10, 2007 at 1:20 PM, in message <[EMAIL PROTECTED]>, Kate Riggsby <[EMAIL PROTECTED]> wrote: -snip- >The linux userid running the application was using about 3-4% of the > cpu. The day we added our instance to the (external) load balancer its > base cpu consumption went to 18% of the ifl, even during application > downtime. Kate, What distribution (and version) are you running? What performance monitor are you using? As Rob said, much more information is needed to even start to figure this out. It doesn't seem at all reasonable to me that simple https connections would drive so much CPU utilization (although https uses more CPU than simple https, due to the SSL component). Is this running on z/VM? If so, how big is the guest? How much virtual storage did you give it? What does "cat /proc/meminfo" show you? Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
Thank you all for sharing experiences and for advice. It gives me hope there may be a way around my brick wall! Rob, page-in is what this problem feels like. But the lpar has 6G/2G of main/xstor and the total virtual storage of all the guests together is 3264M, of which the problem linux guest has 2G. VM reports 0 paging. Just to clarify, there is one ifl on the z800. The lpar is running the regular VM service machines, two small non-load-balanced linuxes and then this problem instance. The linux I'm talking about is load-balanced with five standalone (not on VM) boxes. Our vm linux instance shouldn't be running a firewall but it's another thing I should verify. Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are running SLES9 SP3 (64bit). The system is using Performance Toolkit. cat /proc/meminfo shows: MemTotal: 2050128 kB LowFree:343412 kB MemFree:343412 kB SwapTotal: 475852 kB Buffers:143588 kB SwapFree: 475852 kB Cached:1128868 kB Dirty: 444 kB SwapCached: 0 kB Writeback: 0 kB Active:1035208 kB Mapped: 279544 kB Inactive: 486132 kB Slab: 163204 kB HighTotal: 0 kB Committed_AS: 2997492 kB HighFree:0 kB PageTables: 2496 kB LowTotal: 2050128 kB VmallocTotal: 4292861952 kB VmallocUsed: 2532 kB VmallocChunk: 4292859180 kB thanks, kate On 9/11/07, Rob van der Heij wrote: >Although you say there's enough real memory, it may be the system is >not configured correctly and still pages the Linux guests. Your >performance monitor should be able to provide more data than what you >mention in your post. You'd need to see whether it's indeed these two >Linux servers that consume the extra cycles, and if so, see which >processes are doing that in Linux. I would not expect that opening a >connecting on port 443 and ending it would cause a lot of CPU activity >(unless it triggers firewalls in Linux). > >I think I read from your post that there's one IFL on the z800. That >means that you probably don't make things go faster by spreading the >load over multiple virtual machines (actually, you will make it >slower). The folks who came up with the model of probing port 443 may >have had a different failure model than what's applicable to running >two Linux virtual machines on the same z/VM (but I also know that such >sometimes is a nasty fight). > >Rob >-- >Rob van der Heij -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
A decent performance monitor (ESALPS comes to mind) will tell you exactly what processes are using the cpu and exactly how much. Have you considered running a decent performance monitor? Kate Riggsby wrote: Thank you all for sharing experiences and for advice. It gives me hope there may be a way around my brick wall! Rob, page-in is what this problem feels like. But the lpar has 6G/2G of main/xstor and the total virtual storage of all the guests together is 3264M, of which the problem linux guest has 2G. VM reports 0 paging. Just to clarify, there is one ifl on the z800. The lpar is running the regular VM service machines, two small non-load-balanced linuxes and then this problem instance. The linux I'm talking about is load-balanced with five standalone (not on VM) boxes. Our vm linux instance shouldn't be running a firewall but it's another thing I should verify. Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are running SLES9 SP3 (64bit). The system is using Performance Toolkit. cat /proc/meminfo shows: MemTotal: 2050128 kB LowFree:343412 kB MemFree:343412 kB SwapTotal: 475852 kB Buffers:143588 kB SwapFree: 475852 kB Cached:1128868 kB Dirty: 444 kB SwapCached: 0 kB Writeback: 0 kB Active:1035208 kB Mapped: 279544 kB Inactive: 486132 kB Slab: 163204 kB HighTotal: 0 kB Committed_AS: 2997492 kB HighFree:0 kB PageTables: 2496 kB LowTotal: 2050128 kB VmallocTotal: 4292861952 kB VmallocUsed: 2532 kB VmallocChunk: 4292859180 kB thanks, kate On 9/11/07, Rob van der Heij wrote: Although you say there's enough real memory, it may be the system is not configured correctly and still pages the Linux guests. Your performance monitor should be able to provide more data than what you mention in your post. You'd need to see whether it's indeed these two Linux servers that consume the extra cycles, and if so, see which processes are doing that in Linux. I would not expect that opening a connecting on port 443 and ending it would cause a lot of CPU activity (unless it triggers firewalls in Linux). I think I read from your post that there's one IFL on the z800. That means that you probably don't make things go faster by spreading the load over multiple virtual machines (actually, you will make it slower). The folks who came up with the model of probing port 443 may have had a different failure model than what's applicable to running two Linux virtual machines on the same z/VM (but I also know that such sometimes is a nasty fight). Rob -- Rob van der Heij -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 begin:vcard fn:Barton Robinson n:Robinson;Barton adr;dom:;;PO 390640;Mountain View;CA;94039-0640 email;internet:[EMAIL PROTECTED] title:Sr. Architect tel;work:650-964-8867 note:If you can't measure it, I'm just not interested x-mozilla-html:FALSE url:http://velocitysoftware.com version:2.1 end:vcard
Re: linux performance behind load balancer
On Thursday, 09/13/2007 at 10:22 EDT, barton <[EMAIL PROTECTED]> wrote: > A decent performance monitor (ESALPS comes to mind) will tell you exactly what > processes > are using the cpu and exactly how much. Have you considered running a decent > performance > monitor? Finishing the thought, IBM's OMEGAMON comes to mind as well. There's more than one "decent" performance monitor Out There, so shop and compare. Real point: Successful z/VM+Linux deployments include, among other things, tools that can monitor resource consumption of the box, your LPAR, and your Linux guests. But as has been noted, while that function is necessary, it is not, by any reasonable measure, sufficient. You must also be able to correlate that with information on what's going on *inside* the guest. IMO, they should be part of POCs, too. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
As well as inside your App Server if you are using one of those. Easy to create bad java or misconfigured WAS :). Introscope and ITCAM are 2 examples of those. Marcy Cortes "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Alan Altmark Sent: Thursday, September 13, 2007 7:56 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: [LINUX-390] linux performance behind load balancer On Thursday, 09/13/2007 at 10:22 EDT, barton <[EMAIL PROTECTED]> wrote: > A decent performance monitor (ESALPS comes to mind) will tell you exactly what > processes > are using the cpu and exactly how much. Have you considered running a decent > performance > monitor? Finishing the thought, IBM's OMEGAMON comes to mind as well. There's more than one "decent" performance monitor Out There, so shop and compare. Real point: Successful z/VM+Linux deployments include, among other things, tools that can monitor resource consumption of the box, your LPAR, and your Linux guests. But as has been noted, while that function is necessary, it is not, by any reasonable measure, sufficient. You must also be able to correlate that with information on what's going on *inside* the guest. IMO, they should be part of POCs, too. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 smime.p7s Description: S/MIME cryptographic signature
Re: linux performance behind load balancer
On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote: > Finishing the thought, IBM's OMEGAMON comes to mind as well. There's more > than one "decent" performance monitor Out There, so shop and compare. But since that will present incorrect CPU breakdown per Linux process, it may lead to wrong conclusions. ESALPS will correct the CPU usage for virtualization effects. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
On Thursday, 09/13/2007 at 04:07 EDT, Rob van der Heij <[EMAIL PROTECTED]> wrote: > On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote: > > > Finishing the thought, IBM's OMEGAMON comes to mind as well. There's more > > than one "decent" performance monitor Out There, so shop and compare. > > But since that will present incorrect CPU breakdown per Linux process, > it may lead to wrong conclusions. ESALPS will correct the CPU usage > for virtualization effects. SLES 10 and RHEL 5 correct for the virtualization effects and OMEGAMON gives the "normalized" numbers. But the importance of that depends on what you want to know, doesn't it? If you're interested in which Linux process is hogging the guest, the absolute number is irrelevant. It *is* true that for capacity planning and chargeback you need a better absolute number than what older distros provide. As a result, we decided to update OMEGAMON to normalize those numbers for guests that don't generate the more accurate data. We hope to deliver that late this year or early next. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
>>> On Thu, Sep 13, 2007 at 9:38 AM, in message <[EMAIL PROTECTED]>, Kate Riggsby <[EMAIL PROTECTED]> wrote: -snip- > Mark, the lpar is running z/VM 5.2 at 0601. The linux guests are > running SLES9 SP3 (64bit). The system is using Performance Toolkit. > cat /proc/meminfo shows: >MemTotal: 2050128 kB LowFree:343412 kB >MemFree:343412 kB SwapTotal: 475852 kB >Buffers:143588 kB SwapFree: 475852 kB >Cached:1128868 kB Dirty: 444 kB >SwapCached: 0 kB Writeback: 0 kB >Active:1035208 kB Mapped: 279544 kB >Inactive: 486132 kB Slab: 163204 kB >HighTotal: 0 kB Committed_AS: 2997492 kB >HighFree:0 kB PageTables: 2496 kB >LowTotal: 2050128 kB VmallocTotal: 4292861952 kB >VmallocUsed: 2532 kB >VmallocChunk: 4292859180 kB As lots of other people have already said (and with whom I agree), you really need some sort of performance monitor to figure out problems such as this. One thing I will say regardless of that, is that it looks like you have almost half a gig of inactive pages in your guest. That means that you can likely reduce your guest size by about that much (subject to subsequent measurement of the results), and see if your Linux paging rates go up a whole bunch. The fact that your system is using about 1GB of storage for caching tells me that likely they won't. (And, don't confuse page space in use with paging I/O _rates_. The first is OK, the second needs to be watched closely.) Doing this is not likely to have an impact on your immediate problem, but keeping your guest sizes as small as possible is a good habit to get into now. Second, since this is a brand new install, why are you using SLES9, and not SLES10? ISV certifications and such is a perfectly good reason, of course, but if there's nothing like that standing in the way, I would recommend using SLES10 SP1. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
>>> On Thu, Sep 13, 2007 at 9:45 PM, in message <[EMAIL PROTECTED]>, Alan Altmark <[EMAIL PROTECTED]> wrote: -snip- > SLES 10 and RHEL 5 correct for the virtualization effects and OMEGAMON > gives the "normalized" numbers. Unfortunately in this case, SLES9 is being used, so the incorrect performance data is a problem. (One of the fairly numerous reasons I would recommend using SLES10 SP1.) > But the importance of that depends on what you want to know, doesn't it? > If you're interested in which Linux process is hogging the guest, the > absolute number is irrelevant. With the way things were designed in the 2.4 and earlier 2.6 kernels, that might be somewhat suspect as well. I've seen cases where the "bad guy" process wasn't easy to find. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
On 9/14/07, Alan Altmark <[EMAIL PROTECTED]> wrote: > But the importance of that depends on what you want to know, doesn't it? > If you're interested in which Linux process is hogging the guest, the > absolute number is irrelevant. But if you're comparing usage before and after some configuration change, it does become important. Simply the fact that you generate load in other virtual machines that were idle before, Linux will think the real business process takes more CPU resources per transaction than before. And Linux tools will tell you so, even though it is not true. That's what may make you bark up the wrong tree. But you *are* very right that performance monitor should be part of your Proof of Concept. We don't see a PoC fail these days because software does not work or cannot be found. We see it fail because people suffer from poor performance and have nobody to turn to for help in that new environment. So they get folks with Linux skills on discrete servers and they do all the wrong things because tuning with Linux on z/VM is rarely intuitive. Or folks using the wrong tools to measure and draw the wrong conclusions about the capacity of their installation and their TCO if they grow it. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
Rob, As we are just switching to Omegamon and almost up to implementation of our first user to come into a new zLinux front end, can you give ant further details on your comment below? Thanks Kevin -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Rob van der Heij Sent: Thursday, September 13, 2007 4:07 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: linux performance behind load balancer On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote: > Finishing the thought, IBM's OMEGAMON comes to mind as well. There's more > than one "decent" performance monitor Out There, so shop and compare. But since that will present incorrect CPU breakdown per Linux process, it may lead to wrong conclusions. ESALPS will correct the CPU usage for virtualization effects. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
On Friday, 09/14/2007 at 02:19 EDT, Rob van der Heij <[EMAIL PROTECTED]> wrote: > But you *are* very right that performance monitor should be part of > your Proof of Concept. We don't see a PoC fail these days because > software does not work or cannot be found. We see it fail because > people suffer from poor performance and have nobody to turn to for > help in that new environment. So they get folks with Linux skills on > discrete servers and they do all the wrong things because tuning with > Linux on z/VM is rarely intuitive. Or folks using the wrong tools to > measure and draw the wrong conclusions about the capacity of their > installation and their TCO if they grow it. Amen. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
>>> On Fri, Sep 14, 2007 at 6:48 AM, in message <[EMAIL PROTECTED]>, "Evans, Kevin R" <[EMAIL PROTECTED]> wrote: > Rob, > > As we are just switching to Omegamon and almost up to implementation of > our first user to come into a new zLinux front end, can you give ant > further details on your comment below? Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption was tracked by the Linux kernel didn't take into account that the system may be running in a shared/virtualized environment. The (valid until LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel was in complete control of the hardware, so any passage of time between the last clock value taken, and the current one, was assigned to whatever process was dispatched in the interval. The problem being, of course, that the virtual machine/LPAR might not have been running at all during that time. So, Linux could report that the CPU was 100% busy, when in fact it was only being dispatched, for example, 3% of the time. Of the various performance monitors that were being marketed for mainframe Linux, only Velocity Software's product combined the Linux data with the z/VM monitor data, and normalized the Linux values to be correct. (Obviously this only worked in an environment where z/VM was being used as the hypervisor.) This was a big factor in many cases of which monitor to choose. Since the release of the "cpu accounting" patches, and incorporation into SLES and RHEL, that's no longer the case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi, almost everyone else!), or RHEL3 or 4. Now the decision is based on more traditional criteria, as opposed to being right or very wrong. If you have a userid and password to access the SHARE proceedings, you can see Martin Schwidefsky's presentation on this at http://www.share.org/member_center/open_document.cfm?document=proceedings/SHARE_in_Seattle/S9266XX172938.pdf (I have no idea why I didn't ask Martin for a copy of that for the linuxvm.org web site. Rats.) Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
(Hi Mark!) That's the disadvantage of starting before everyone else and having too many servers :) At least I've killed the sles7's! The problem with sles8 to sles9x is it's a new server. That requires the cooperation of the users. They don't like to do that if everything is all hunky dory. They have other things to do (so they tell me). I'm hoping sles9x to sles10x is a true upgrade and we can do it without bothering the applications folks. That's a project to figure out over the holiday freeze, though. I'm pretty sure all of production will be sles9x within the next 2 months - woo hoo! The promise of better performance from WAS6.1 and sles9x saving them a few IFLs is finally getting their attention. (see you next week). Marcy Cortes "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Mark Post Sent: Friday, September 14, 2007 9:20 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: [LINUX-390] linux performance behind load balancer >>> On Fri, Sep 14, 2007 at 6:48 AM, in message <[EMAIL PROTECTED]>, "Evans, Kevin R" <[EMAIL PROTECTED]> wrote: > Rob, > > As we are just switching to Omegamon and almost up to implementation > of our first user to come into a new zLinux front end, can you give > ant further details on your comment below? Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption was tracked by the Linux kernel didn't take into account that the system may be running in a shared/virtualized environment. The (valid until LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel was in complete control of the hardware, so any passage of time between the last clock value taken, and the current one, was assigned to whatever process was dispatched in the interval. The problem being, of course, that the virtual machine/LPAR might not have been running at all during that time. So, Linux could report that the CPU was 100% busy, when in fact it was only being dispatched, for example, 3% of the time. Of the various performance monitors that were being marketed for mainframe Linux, only Velocity Software's product combined the Linux data with the z/VM monitor data, and normalized the Linux values to be correct. (Obviously this only worked in an environment where z/VM was being used as the hypervisor.) This was a big factor in many cases of which monitor to choose. Since the release of the "cpu accounting" patches, and incorporation into SLES and RHEL, that's no longer the case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi, almost everyone else!), or RHEL3 or 4. Now the decision is based on more traditional criteria, as opposed to being right or very wrong. If you have a userid and password to access the SHARE proceedings, you can see Martin Schwidefsky's presentation on this at http://www.share.org/member_center/open_document.cfm?document=proceeding s/SHARE_in_Seattle/S9266XX172938.pdf (I have no idea why I didn't ask Martin for a copy of that for the linuxvm.org web site. Rats.) Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
Forgive the soap box. This is old news. Linux process data in any virtual environment is wrong. This was measured and presented in a production environment as off by order of magnitude. This is true for all releases and distributions of linux. ibm claims there is a fix in sles10, this has never been validated in any presentation. i don't like claims that sound suspiciously like vaporware. Why is this data bad? it's useless. 1) imagine someone doing application tuning and using this data and thinking they've improved their app performance - and their data is wrong, leads to wrong conclusion 2) system utilization high, logon to any linux using linux tools, you might think top is the hog, but if the system utilization is high, you will make very poor choices of what processes or linux server to kill 3) if you are making poc decisions based on this data, you will think the mainframe is dog slow. this is bad for all of us and leads to poor financial decisions. This gives a very good platform a bad image. 6 years ago when velocity software analyzed this, we found a way to correct and record the process data. (for all linux releases and distributions) thus this data is useful for all of the above. no other vendor (other than velocity software) has presented a solution for this problem - and validated in any public forum. So when i hear about installations planning on dependancies on garbage data, i think about how many people would drive cars without gas gauges or speedometers? was the used car salesperson ethical in taking advantage of the naive buyer? Evans, Kevin R wrote: Rob, As we are just switching to Omegamon and almost up to implementation of our first user to come into a new zLinux front end, can you give ant further details on your comment below? Thanks Kevin -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Rob van der Heij Sent: Thursday, September 13, 2007 4:07 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: linux performance behind load balancer On 9/13/07, Alan Altmark <[EMAIL PROTECTED]> wrote: Finishing the thought, IBM's OMEGAMON comes to mind as well. There's more than one "decent" performance monitor Out There, so shop and compare. But since that will present incorrect CPU breakdown per Linux process, it may lead to wrong conclusions. ESALPS will correct the CPU usage for virtualization effects. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 begin:vcard fn:Barton Robinson n:Robinson;Barton adr;dom:;;PO 390640;Mountain View;CA;94039-0640 email;internet:[EMAIL PROTECTED] title:Sr. Architect tel;work:650-964-8867 note:If you can't measure it, I'm just not interested x-mozilla-html:FALSE url:http://velocitysoftware.com version:2.1 end:vcard
Re: linux performance behind load balancer
There are some issues with WAS right now that seriously impact Linux under z/VM. Rob's out of town, he can explain better. The problem is that the current JDK polls every 10ms. this means the WAS servers stay in queue. We have been seeing the total to virtual storage over allocation ratios that sites can attain have been dropping, traced it down to servers not dropping from queue. Rob tracked it down to the WAS polling. We're hoping for relief next year. So be careful about the performance feecher of 6.1. Marcy Cortes wrote: (Hi Mark!) I'm pretty sure all of production will be sles9x within the next 2 months - woo hoo! The promise of better performance from WAS6.1 and sles9x saving them a few IFLs is finally getting their attention. (see you next week). Marcy Cortes -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Mark Post Sent: Friday, September 14, 2007 9:20 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: [LINUX-390] linux performance behind load balancer On Fri, Sep 14, 2007 at 6:48 AM, in message <[EMAIL PROTECTED]>, "Evans, Kevin R" <[EMAIL PROTECTED]> wrote: Rob, As we are just switching to Omegamon and almost up to implementation of our first user to come into a new zLinux front end, can you give ant further details on your comment below? Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption was tracked by the Linux kernel didn't take into account that the system may be running in a shared/virtualized environment. The (valid until LPARs, z/VM, VMware, and Xen) assumption in place was that the kernel was in complete control of the hardware, so any passage of time between the last clock value taken, and the current one, was assigned to whatever process was dispatched in the interval. The problem being, of course, that the virtual machine/LPAR might not have been running at all during that time. So, Linux could report that the CPU was 100% busy, when in fact it was only being dispatched, for example, 3% of the time. Of the various performance monitors that were being marketed for mainframe Linux, only Velocity Software's product combined the Linux data with the z/VM monitor data, and normalized the Linux values to be correct. (Obviously this only worked in an environment where z/VM was being used as the hypervisor.) This was a big factor in many cases of which monitor to choose. Since the release of the "cpu accounting" patches, and incorporation into SLES and RHEL, that's no longer the case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi, almost everyone else!), or RHEL3 or 4. Now the decision is based on more traditional criteria, as opposed to being right or very wrong. If you have a userid and password to access the SHARE proceedings, you can see Martin Schwidefsky's presentation on this at http://www.share.org/member_center/open_document.cfm?document=proceeding s/SHARE_in_Seattle/S9266XX172938.pdf (I have no idea why I didn't ask Martin for a copy of that for the linuxvm.org web site. Rats.) Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 begin:vcard fn:Barton Robinson n:Robinson;Barton adr;dom:;;PO 390640;Mountain View;CA;94039-0640 email;internet:[EMAIL PROTECTED] title:Sr. Architect tel;work:650-964-8867 note:If you can't measure it, I'm just not interested x-mozilla-html:FALSE url:http://velocitysoftware.com version:2.1 end:vcard
Re: linux performance behind load balancer
>>> On Fri, Sep 14, 2007 at 4:11 PM, in message <[EMAIL PROTECTED]>, barton <[EMAIL PROTECTED]> wrote: > There are some issues with WAS right now that seriously impact Linux under > z/VM. Rob's > out of town, he can explain better. The problem is that the current JDK > polls every > 10ms. this means the WAS servers stay in queue. We have been seeing the > total to virtual > storage over allocation ratios that sites can attain have been dropping, > traced it down to > servers not dropping from queue. Rob tracked it down to the WAS polling. > We're hoping for > relief next year. So be careful about the performance feecher of 6.1. It was Rob working with me on the Linux/390 wiki system that led him to the discovery that the IBM JDK was issuing 10ms sleeps. It wasn't just in the newer versions of the JDK, it was in the 1.4.2 ones as well. So, upgrading to a newer version of WAS and it's associated Java, shouldn't be any worse in that regard. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
Production I'm not so worried about because it has adequate capacity (no paging) and the servers run all the time anyway so don't drop from queue because of real work. They've benchmarked and measured (with Velocity tools :) the differences betweens sles9x/was6 and sles8/was5 and see significant differences. We'll let you know for sure next week with the real workload going through. But this might explain the increased paging load on our test system, which already was bursting at the seams. Do you know what level of the JDK that started in? Marcy Cortes "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of barton Sent: Friday, September 14, 2007 1:12 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: [LINUX-390] linux performance behind load balancer There are some issues with WAS right now that seriously impact Linux under z/VM. Rob's out of town, he can explain better. The problem is that the current JDK polls every 10ms. this means the WAS servers stay in queue. We have been seeing the total to virtual storage over allocation ratios that sites can attain have been dropping, traced it down to servers not dropping from queue. Rob tracked it down to the WAS polling. We're hoping for relief next year. So be careful about the performance feecher of 6.1. Marcy Cortes wrote: > (Hi Mark!) > > I'm pretty sure all of production will be sles9x within the next 2 > months - woo hoo! The promise of better performance from WAS6.1 and > sles9x saving them a few IFLs is finally getting their attention. > > (see you next week). > > Marcy Cortes > > > -Original Message- > From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of > Mark Post > Sent: Friday, September 14, 2007 9:20 AM > To: LINUX-390@VM.MARIST.EDU > Subject: Re: [LINUX-390] linux performance behind load balancer > > >>>>On Fri, Sep 14, 2007 at 6:48 AM, in message > > <[EMAIL PROTECTED]>, > "Evans, Kevin R" <[EMAIL PROTECTED]> wrote: > >>Rob, >> >>As we are just switching to Omegamon and almost up to implementation >>of our first user to come into a new zLinux front end, can you give >>ant further details on your comment below? > > > Prior to the kernels used in SLES10 and RHEL5, the way CPU consumption > was tracked by the Linux kernel didn't take into account that the > system may be running in a shared/virtualized environment. The (valid > until LPARs, z/VM, VMware, and Xen) assumption in place was that the > kernel was in complete control of the hardware, so any passage of time > between the last clock value taken, and the current one, was assigned > to whatever process was dispatched in the interval. The problem > being, of course, that the virtual machine/LPAR might not have been > running at all during that time. So, Linux could report that the CPU > was 100% busy, when in fact it was only being dispatched, for example, 3% of the time. > > Of the various performance monitors that were being marketed for > mainframe Linux, only Velocity Software's product combined the Linux > data with the z/VM monitor data, and normalized the Linux values to be > correct. (Obviously this only worked in an environment where z/VM was > being used as the hypervisor.) This was a big factor in many cases of > which monitor to choose. Since the release of the "cpu accounting" > patches, and incorporation into SLES and RHEL, that's no longer the > case, unless you're still running SLES8 (Hi, Marcy!) and SLES9 (Hi, > almost everyone else!), or RHEL3 or 4. Now the decision is based on > more traditional criteria, as opposed to being right or very wrong. > > If you have a userid and password to access the SHARE proceedings, you > can see Martin Schwidefsky's presentation on this at > http://www.share.org/member_center/open_document.cfm?document=proceedi > ng > s/SHARE_in_Seattle/S9266XX172938.pdf > > (I have no idea why I didn't ask Martin for a copy of that for the > linuxvm.org web site. Rats.) > > > Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
>>> On Fri, Sep 14, 2007 at 4:57 PM, in message <[EMAIL PROTECTED]>, Marcy Cortes <[EMAIL PROTECTED]> wrote: -snip- > Do you know what level of the JDK that started in? It's been around for a while. The version I was running at the time Rob first investigated was an older 1.4.2 release. Mark Post -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
barton wrote: Forgive the soap box. This is old news. Linux process data in any virtual environment is wrong. This was measured and presented in a production environment as off by order of magnitude. This is true for all releases and distributions of linux. ibm claims there is a fix in sles10, this has never been validated in any presentation. i don't like claims that sound suspiciously like vaporware. Your own comments don't sound any better, to me. Are you claiming that the accounting patches mentioned elsewhere don't work as claimed? Can you support that claim? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Please do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: linux performance behind load balancer
On 9/14/07, Mark Post <[EMAIL PROTECTED]> wrote: > It was Rob working with me on the Linux/390 wiki system that led him to the > discovery that the IBM JDK was issuing 10ms sleeps. It wasn't just in the > newer versions of the JDK, it was in the 1.4.2 ones as well. So, upgrading > to a newer version of WAS and it's associated Java, shouldn't be any worse in > that regard. He's back now... ;-) There are issues in some later JDK levels (not yet 1.4.2) and Velocity Software have a bypass for it - IBM is working on getting the bypass supported and eventually fix the problem. And there are problems in applications. In the Open Source Java application Mark and I looked at, I could identify the cause and repair it. With later versions of WAS (after 4.5) there are similar problems. And with closed source like WAS I can only identify it and yell at developers who may listen but do not understand. Because each Java application seems to ship with its own copy of the JVM, it's not always easy to understand whether its the JVM or the application. But it may not be relevant either where the problem is, unless you're in a position to fix it... Oh, and part of IBM's DB2 also has such problems, even though it's not in Java. IMHO the cause is a skills problem with the developers. Poorly written code they can get away with on discrete servers, that is slightly noticed on simple virtualization like VMware, but impacts scalability when you get as advanced as we can do with z/VM. This is not easy stuff. Although Barton says people don't want to hear difficult explanations, I may need to sit down and write about it. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390