Re: Capacity Monitoring question

2011-03-04 Thread Ward, Mike S
Where can I get a copy of GG22-9299. I have looked in IBM, but I can't
seem to find it.



From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On
Behalf Of George Henke/NYLIC
Sent: Thursday, March 03, 2011 10:43 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Capacity Monitoring question



An operating system, be in z/VM or z/OS, will always try to drive the
CPU 100%.

This is goodness.

So looking at max CPU will never tell you anything about CPU capacity,
it is an almost meaningless metric which can be at best very deceptive,
a common, innocent mistake and misconception.

What is needed is the Saturation Data Point (SDP), which is calculated
as the HIST average CPU peaks divided by the HISTaverage CPU average.

The operative word here is HISTORICAL, 6 months at least.

This will give you the peak-to- average ratio:  2:1, 3:2, or whatever.
It depends on the nature of the workloads, their variability.

It differs for every shop.

Once you know the peak-to-average ratio all we need do at any given
point in time is look at the average instead of the peak and that will
tell us if we have reached the SDP and need more CPU.

To illustrate:

Let's say the CPU is pegged at 100% and averages 60%, though you ignore
the average as unimportant.  thinking you're configuring for the peak,
not the average.

Let's also say that your historical peak to average ratio is 2:1, though
you do not know that or consider it important at the time.

So the CIO orders a CPU upgrade, paying millions in TPV software charges
for the upgrade.

After the upgrade, the CIO looks at the CPU and sees it is maxing now at
60% and he is ecstatic because he thinks he has 40% CPU headroom and
with about 5-10% annual CPU growth he has at least a 3 - 5 year life in
the configuration.

6 months later, the batch window is expanding, batch is backing up,
response time is degrading, CPU is maxing at 100% and the CIO wants to
know what happened.

With a 2:1 peak to average ratio, when the CPU maxes at:

*   60%, it averages 30%.
*   100% it averages 50%.
*   120% it averages 60%


So the headroom, initially, was not 40% (100%-60%) as the CIO thought,
but only 20% (50%-30%) which got absorbed in 6 months

But why so quickly?  Why 20% in only 6 months?

Remember before the upgrade the average CPU was 60%.

That means the maximum CPU was really 120% not 100% and there was 20%
latent demand.  Machines do not report much more than 100% CPU.  Another
reason not to be mislead by max CPU.

So between the 5 - 10 % normal CPU growth and the impact of the 20%
latent demand, the config really had a life of only 6 months.

True  we should always configure for the peaks, not averages, but it is
only the average that will ever tell us when we are out of CPU once we
know the historical peak-to-average ratio.

R.J. Wicks, IBM, has written a classic manual on this:  Balanced Systems
and Capacity Planning, GG22-9299  (119 pages)





Berry van Sleeuwen berry.vansleeu...@xs4all.nl
Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU

03/02/2011 07:05 PM

Please respond to
IBMVM IBMVM@LISTSERV.UARK.EDU

To

IBMVM@LISTSERV.UARK.EDU

cc


Subject

Re: Capacity Monitoring question








Hi Nick,

We monitor VM on page usage and page IO, our guest on VM for Queue and
storage usage (main, xstor and swap). Also we monitor guest CPU usage
and metrics like the limit list. Linux memory is always at 100% so no
sense in monitoring over there but we do monitor swap usage. Linux CPU
gives bad numbers to start with (yes even on current kernel levels they
are still wrong) so don't monitor CPU on the guests.

Actually, 100% CPU is not a bad thing at all. Where most OS-ses become
less responsive above 90% z/VM will still give you good response even at
high numbers. We like to have it above 90%. Obviously you would need
some capacity for new guests. So when you are running 100% CPU all the
time there can be a case for an additional IFL. But also look at the
guests, determine if they are running processes you don't need or that
hurt overal performance. Watch your linux guests on responsetimes and
batch runtimes. Set a good relative share and if that doesn't help you
could consider adding IFL's.

Keep VM paging below 50%, add paging DASD when needed. We have a VM that
is overcommitted to 9:1. Our production Linux VM is at 2:1 with room to
spare. Expect even high page IO rates, 1000's IO/sec don't have to be
bad. Keep an eye on guests that are competing for storage. Especially
loading users and E-lists can point to a resource problem. Try to fix it
on the guest first (eliminate processes, reduce memory sizes etc).

Make sure the guests don't stay in Q3. It will hurt other servers. So
eliminate unused processes, don't use pings or other keep alive tooling.
Be aware that most regular linux tooling keeps the guest active.
Obviously when you are running batch the guest will stay in Q3 but then
it's in there for a reason.

Some of these issues are also covered

Re: Capacity Monitoring question

2011-03-04 Thread Bill Munson
I know that Bill Bitner has referenced this book in some of his 
performance talks.

If you know someone that is a member of CMG they might can get you a copy. 


http://www.cmg.org/cgi-bin/search.cgi?np=4q=Ray%20Wickss=SRPDsu=title 

when I tried to open this book , I was asked to sign in 
Performance Analysis For Capacity Planning - An Introduction 
June 2009 | By Ray Wicks 
... Capacity Planning - an Introduction Ray wicks IBM Capacity Planning 
... Jam in reference 1 and Borchetta and Wicks in reference 2) says that 
the rate at ... Planning, by Page Borchetta and Ray Wicks, IBM publication 
GG22-9299-04. 
 
good luck

Bill Munson 
Sr. z/VM Systems Programmer 
Brown Brothers Harriman  CO.
525 Washington Blvd. 
Jersey City, NJ 07310 
201-418-7588

President - MVMUA
http://www2.marist.edu/~mvmua/
LVM Program Officer - SHARE 
http://www.linkedin.com/in/BillMunson




From:   Ward, Mike S mw...@ssfcu.org
To: IBMVM@LISTSERV.UARK.EDU
Date:   03/04/2011 10:37 AM
Subject:Re: Capacity Monitoring question
Sent by:The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



Where can I get a copy of GG22-9299. I have looked in IBM, but I can?t 
seem to find it.
 
From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On 
Behalf Of George Henke/NYLIC
Sent: Thursday, March 03, 2011 10:43 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Capacity Monitoring question
 
An operating system, be in z/VM or z/OS, will always try to drive the CPU 
100%. 

This is goodness. 

So looking at max CPU will never tell you anything about CPU capacity, it 
is an almost meaningless metric which can be at best very deceptive, a 
common, innocent mistake and misconception. 

What is needed is the Saturation Data Point (SDP), which is calculated as 
the HIST average CPU peaks divided by the HISTaverage CPU average. 

The operative word here is HISTORICAL, 6 months at least. 

This will give you the peak-to- average ratio:  2:1, 3:2, or whatever.  It 
depends on the nature of the workloads, their variability. 

It differs for every shop. 

Once you know the peak-to-average ratio all we need do at any given point 
in time is look at the average instead of the peak and that will tell us 
if we have reached the SDP and need more CPU. 

To illustrate: 

Let's say the CPU is pegged at 100% and averages 60%, though you ignore 
the average as unimportant.  thinking you're configuring for the peak, not 
the average. 

Let's also say that your historical peak to average ratio is 2:1, though 
you do not know that or consider it important at the time. 

So the CIO orders a CPU upgrade, paying millions in TPV software charges 
for the upgrade. 

After the upgrade, the CIO looks at the CPU and sees it is maxing now at 
60% and he is ecstatic because he thinks he has 40% CPU headroom and with 
about 5-10% annual CPU growth he has at least a 3 - 5 year life in the 
configuration. 

6 months later, the batch window is expanding, batch is backing up, 
response time is degrading, CPU is maxing at 100% and the CIO wants to 
know what happened. 

With a 2:1 peak to average ratio, when the CPU maxes at: 
60%, it averages 30%. 
100% it averages 50%. 
120% it averages 60%

So the headroom, initially, was not 40% (100%-60%) as the CIO thought, but 
only 20% (50%-30%) which got absorbed in 6 months 

But why so quickly?  Why 20% in only 6 months? 

Remember before the upgrade the average CPU was 60%. 

That means the maximum CPU was really 120% not 100% and there was 20% 
latent demand.  Machines do not report much more than 100% CPU.  Another 
reason not to be mislead by max CPU. 

So between the 5 - 10 % normal CPU growth and the impact of the 20% latent 
demand, the config really had a life of only 6 months. 

True  we should always configure for the peaks, not averages, but it is 
only the average that will ever tell us when we are out of CPU once we 
know the historical peak-to-average ratio. 

R.J. Wicks, IBM, has written a classic manual on this:  Balanced Systems 
and Capacity Planning, GG22-9299  (119 pages) 




Berry van Sleeuwen berry.vansleeu...@xs4all.nl 
Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 
03/02/2011 07:05 PM 


Please respond to
IBMVM IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU 
cc

Subject
Re: Capacity Monitoring question
 








Hi Nick,

We monitor VM on page usage and page IO, our guest on VM for Queue and
storage usage (main, xstor and swap). Also we monitor guest CPU usage
and metrics like the limit list. Linux memory is always at 100% so no
sense in monitoring over there but we do monitor swap usage. Linux CPU
gives bad numbers to start with (yes even on current kernel levels they
are still wrong) so don't monitor CPU on the guests.

Actually, 100% CPU is not a bad thing at all. Where most OS-ses become
less responsive above 90% z/VM will still give you good response even at
high numbers. We like to have it above 90%. Obviously you would need
some capacity for new guests

Re: Capacity Monitoring question

2011-03-04 Thread Gregg
One can join CMG or one can Register which allows access (some) to Papers
that have been presented.  In Search there, Mr Wicks has presented and is
referenced.  The GG22-9299 won't be there but tons of capacity planning
ideas are.  Won't be any sort of cookbook but you won't lack for reading
material on a cold winter night.

On Fri, Mar 4, 2011 at 10:57 AM, Bill Munson william.mun...@bbh.com wrote:

 I know that Bill Bitner has referenced this book in some of his performance
 talks.

 If you know someone that is a member of CMG they might can get you a copy.

 http://www.cmg.org/cgi-bin/search.cgi?np=4q=Ray%20Wickss=SRPDsu=title

 when I tried to open this book , I was asked to sign in

 *Performance Analysis For Capacity Planning - An 
 Introduction*http://www.cmg.org/proceedings/1995/95INT009.pdf [image:
 Open in a new window] 
 http://www.cmg.org/proceedings/1995/95INT009.pdf[image:
 PDF Document]
 June 2009 | By Ray Wicks
 ... Capacity Planning - an Introduction Ray wicks IBM Capacity Planning ...
 Jam in reference 1 and Borchetta and Wicks in reference 2) says that the
 rate at ... Planning, by Page Borchetta and Ray Wicks, IBM publication
 GG22-9299-04.

 good luck

 Bill Munson
 Sr. z/VM Systems Programmer
 Brown Brothers Harriman  CO.
 525 Washington Blvd.
 Jersey City, NJ 07310
 201-418-7588

 President - MVMUA
 http://www2.marist.edu/~mvmua/
 LVM Program Officer - SHARE
 http://www.linkedin.com/in/BillMunson

 --
 Gregg Reed
 No Plan, survives execution



Re: Capacity Monitoring question

2011-03-03 Thread George Henke/NYLIC
An operating system, be in z/VM or z/OS, will always try to drive the CPU 
100%.

This is goodness.

So looking at max CPU will never tell you anything about CPU capacity, it 
is an almost meaningless metric which can be at best very deceptive, a 
common, innocent mistake and misconception.

What is needed is the Saturation Data Point (SDP), which is calculated as 
the HIST average CPU peaks divided by the HISTaverage CPU average.

The operative word here is HISTORICAL, 6 months at least.

This will give you the peak-to- average ratio:  2:1, 3:2, or whatever.  It 
depends on the nature of the workloads, their variability.

It differs for every shop.

Once you know the peak-to-average ratio all we need do at any given point 
in time is look at the average instead of the peak and that will tell us 
if we have reached the SDP and need more CPU.

To illustrate:

Let's say the CPU is pegged at 100% and averages 60%, though you ignore 
the average as unimportant.  thinking you're configuring for the peak, not 
the average.

Let's also say that your historical peak to average ratio is 2:1, though 
you do not know that or consider it important at the time.

So the CIO orders a CPU upgrade, paying millions in TPV software charges 
for the upgrade.

After the upgrade, the CIO looks at the CPU and sees it is maxing now at 
60% and he is ecstatic because he thinks he has 40% CPU headroom and with 
about 5-10% annual CPU growth he has at least a 3 - 5 year life in the 
configuration.

6 months later, the batch window is expanding, batch is backing up, 
response time is degrading, CPU is maxing at 100% and the CIO wants to 
know what happened.

With a 2:1 peak to average ratio, when the CPU maxes at:

60%, it averages 30%.
100% it averages 50%.
120% it averages 60%

So the headroom, initially, was not 40% (100%-60%) as the CIO thought, but 
only 20% (50%-30%) which got absorbed in 6 months

But why so quickly?  Why 20% in only 6 months?

Remember before the upgrade the average CPU was 60%.

That means the maximum CPU was really 120% not 100% and there was 20% 
latent demand.  Machines do not report much more than 100% CPU.  Another 
reason not to be mislead by max CPU.

So between the 5 - 10 % normal CPU growth and the impact of the 20% latent 
demand, the config really had a life of only 6 months.

True  we should always configure for the peaks, not averages, but it is 
only the average that will ever tell us when we are out of CPU once we 
know the historical peak-to-average ratio.

R.J. Wicks, IBM, has written a classic manual on this:  Balanced Systems 
and Capacity Planning, GG22-9299  (119 pages) 





Berry van Sleeuwen berry.vansleeu...@xs4all.nl 
Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
03/02/2011 07:05 PM
Please respond to
IBMVM IBMVM@LISTSERV.UARK.EDU


To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: Capacity Monitoring question






Hi Nick,

We monitor VM on page usage and page IO, our guest on VM for Queue and
storage usage (main, xstor and swap). Also we monitor guest CPU usage
and metrics like the limit list. Linux memory is always at 100% so no
sense in monitoring over there but we do monitor swap usage. Linux CPU
gives bad numbers to start with (yes even on current kernel levels they
are still wrong) so don't monitor CPU on the guests.

Actually, 100% CPU is not a bad thing at all. Where most OS-ses become
less responsive above 90% z/VM will still give you good response even at
high numbers. We like to have it above 90%. Obviously you would need
some capacity for new guests. So when you are running 100% CPU all the
time there can be a case for an additional IFL. But also look at the
guests, determine if they are running processes you don't need or that
hurt overal performance. Watch your linux guests on responsetimes and
batch runtimes. Set a good relative share and if that doesn't help you
could consider adding IFL's.

Keep VM paging below 50%, add paging DASD when needed. We have a VM that
is overcommitted to 9:1. Our production Linux VM is at 2:1 with room to
spare. Expect even high page IO rates, 1000's IO/sec don't have to be
bad. Keep an eye on guests that are competing for storage. Especially
loading users and E-lists can point to a resource problem. Try to fix it
on the guest first (eliminate processes, reduce memory sizes etc).

Make sure the guests don't stay in Q3. It will hurt other servers. So
eliminate unused processes, don't use pings or other keep alive tooling.
Be aware that most regular linux tooling keeps the guest active.
Obviously when you are running batch the guest will stay in Q3 but then
it's in there for a reason.

Some of these issues are also covered in the linux-390 list
(http://www2.marist.edu/htbin/wlvindex?LINUX-390). Take a look over
there also.

Regards, Berry.

Op 02-03-11 23:28, Nick Warren schreef:
 Hi Tony, Thanks for the response.

 I probably didn't ask the question(s) very well.  I'm working with a 
customer that has no capacity plan regarding

Capacity Monitoring question

2011-03-02 Thread new zvm
I'm relative new to z/VM.  I have a couple of z/VM LPARs runing Linux guests 
and have more coming.  

What I'm wondering is this:

What z/VM metrics are you monitoring and what thresholds do you use as 
indicators that more capacity is needed - Specifically CPU and Memory.

Thanks in advance

New 2 zVM


  

Re: Capacity Monitoring question

2011-03-02 Thread Scott Rohling

If you want help - you're gonna have to introduce yourself...

Scott Rohling

On Mar 2, 2011 12:22pm, new zvm new2...@hotmail.com wrote:
I'm relative new to z/VM. I have a couple of z/VM LPARs runing Linux  
guests and have more coming.





What I'm wondering is this:




What z/VM metrics are you monitoring and what thresholds do you use as  
indicators that more capacity is needed - Specifically CPU and Memory.





Thanks in advance





New 2 zVM









Re: Capacity Monitoring question

2011-03-02 Thread Nick Warren
Sorry,  Didn't know protocol.
 
Nick Warren - freelance consultant.  New to z/VM - experience with AIX, HPUX, 
MVS, Windows.

 Date: Wed, 2 Mar 2011 20:43:35 +
 From: scott.rohl...@gmail.com
 Subject: Re: Capacity Monitoring question
 To: IBMVM@LISTSERV.UARK.EDU

 If you want help - you're gonna have to introduce yourself...

 Scott Rohling

 On Mar 2, 2011 12:22pm, new zvm wrote:
  I'm relative new to z/VM. I have a couple of z/VM LPARs runing Linux
 guests and have more coming.
 
 
 
  What I'm wondering is this:
 
 
 
  What z/VM metrics are you monitoring and what thresholds do you use
 as indicators that more capacity is needed - Specifically CPU and
 Memory.
 
 
 
  Thanks in advance
 
 
 
  New 2 zVM
 
 
 
 
 
 
  

Re: Capacity Monitoring question

2011-03-02 Thread Tony Saul
We use Performance Toolkit with APPLDATA enabled, then from option 29 in Perf 
toolkit we get

   Linux screens selection   
S Display Description
. LINUX   RMF PM system selection menu  
. LXCPU   Summary CPU activity display  
. LXMEM   Summary memory util.  activity display   
. LXNETWRK    Summary network activity display  

 Interval 02:11:28-08:44:10, on 2011/03/03  (CURRENT interval, select interim 
or 
average data)
__  .  . .  . . .   .  . .  .  
.   . . .  .  .  

   --- Total CPU --- 
- Processes --
Linux    Virt  Utilization (%)   
Current - -Average Running- Nr of
Userid  CPUs TotCPU  User Kernel  Nice   IRQ SoftIRQ IOWait  Idle Stolen Runabl 
Waiting Total 1_Min  5_Min 15_Min Users
System  2.0    4.4   2.3    1.9    .0    .0  .1 .9 193.2    1.6    
2.0  .0 434.5   .08    .15    .12 4
DLVOMG01    2 .4    .2 .2    .0    .0  .0 .2 198.8 .6  
2   0   215   .00    .00    .00

 Interval 02:11:28-08:44:10, on 2011/03/03  (CURRENT interval, select interim 
or 
average data)
__  .  .   .  .  .   .  .   .  
. 
. . . . . . .  

    Memory Allocation (MB) - --- 
Swapping 
--- --- Pages/s --- -BlockIO-
Linux --- Main --- --- High ---    Buffers  Cache -Space (MB)- 
-Pgs/sec-  Allo -Faults-- --kB/sec- Nr of
Userid  M_Total %MUsed H_Total %HUsed Shared /CaFree   Used S_Total %SUsed    
In   Out cates Major Minor  Read Write Users
System 3516   98.2  .0 .0 .0   240.7   1855    1744 .1  
.000  .000 331.1  .000 916.1 43.57 37.53 4
DLVOMG01     2007   98.1  .0 .0 .0   225.7   1495   256.0 .0  
.000  .000 103.1  .000 229.5 55.16 18.51

 Regards,
Tony 



- Original Message 
From: Nick Warren new2...@hotmail.com
To: IBMVM@LISTSERV.UARK.EDU
Sent: Thu, 3 March, 2011 7:24:09 AM
Subject: Re: Capacity Monitoring question

Sorry,  Didn't know protocol.

Nick Warren - freelance consultant.  New to z/VM - experience with AIX, HPUX, 
MVS, Windows.

 Date: Wed, 2 Mar 2011 20:43:35 +
 From: scott.rohl...@gmail.com
 Subject: Re: Capacity Monitoring question
 To: IBMVM@LISTSERV.UARK.EDU

 If you want help - you're gonna have to introduce yourself...

 Scott Rohling

 On Mar 2, 2011 12:22pm, new zvm wrote:
  I'm relative new to z/VM. I have a couple of z/VM LPARs runing Linux
 guests and have more coming.
 
 
 
  What I'm wondering is this:
 
 
 
  What z/VM metrics are you monitoring and what thresholds do you use
 as indicators that more capacity is needed - Specifically CPU and
 Memory.
 
 
 
  Thanks in advance
 
 
 
  New 2 zVM
 
 
 
 
 
 





Re: Capacity Monitoring question

2011-03-02 Thread Nick Warren
Hi Tony, Thanks for the response.

I probably didn't ask the question(s) very well.  I'm working with a customer 
that has no capacity plan regarding the use of z/VM as a linux host.  We're 
seeing both CPU and Memory usage on the z/VM side increasing.  Performance on 
the linux guests is acceptable at this time.

Aside from waiting for the linux users to start complaining - what metrics and 
thresholds should I be tracking as early predictors of capacity problems?

Obviously if CPU usage is constantly 100% that's probably not good.  I'm 
currently watching CPU, IOWait and Stolen time but wonder if those are 
sufficient.  Any suggestion as what a good maximum number is?

Memory is a larger concern - In a previous life as a mvs sysprog I would watch 
paging/swapping and delay times among others.  Are there any rules of thumb 
regarding paging or swapping in z/VM?  Is there something better that 
paging/swapping for capacity prediction?

Thanks again,

Nick


 Date: Wed, 2 Mar 2011 13:47:42 -0800
 From: generalemailli...@yahoo.com.au
 Subject: Re: Capacity Monitoring question
 To: IBMVM@LISTSERV.UARK.EDU

 We use Performance Toolkit with APPLDATA enabled, then from option 29 in Perf
 toolkit we get

Linux screens selection
 S Display Description
 . LINUX   RMF PM system selection menu
 . LXCPU   Summary CPU activity display
 . LXMEM   Summary memory util.  activity display
 . LXNETWRKSummary network activity display

  Interval 02:11:28-08:44:10, on 2011/03/03  (CURRENT interval, select interim 
 or
 average data)
 __  .  . .  . . .   .  . .  .
 .   . . .  .  .

--- Total CPU ---
 - Processes --
 LinuxVirt  Utilization (%)  
 
 Current - -Average Running- Nr of
 Userid  CPUs TotCPU  User Kernel  Nice   IRQ SoftIRQ IOWait  Idle Stolen 
 Runabl
 Waiting Total 1_Min  5_Min 15_Min Users
 System  2.04.4   2.31.9.0.0  .1 .9 193.21.6
 2.0  .0 434.5   .08.15.12 4
 DLVOMG012 .4.2 .2.0.0  .0 .2 198.8 .6
 2   0   215   .00.00.00

  Interval 02:11:28-08:44:10, on 2011/03/03  (CURRENT interval, select interim 
 or
 average data)
 __  .  .   .  .  .   .  .   .  .
 . . . . . . .

 Memory Allocation (MB) - --- 
 Swapping
 --- --- Pages/s --- -BlockIO-
 Linux --- Main --- --- High ---Buffers  Cache -Space (MB)-
 -Pgs/sec-  Allo -Faults-- --kB/sec- Nr of
 Userid  M_Total %MUsed H_Total %HUsed Shared /CaFree   Used S_Total %SUsed
 In   Out cates Major Minor  Read Write Users
 System 3516   98.2  .0 .0 .0   240.7   18551744 .1
 .000  .000 331.1  .000 916.1 43.57 37.53 4
 DLVOMG01 2007   98.1  .0 .0 .0   225.7   1495   256.0 .0
 .000  .000 103.1  .000 229.5 55.16 18.51

  Regards,
 Tony



 - Original Message 
 From: Nick Warren 
 To: IBMVM@LISTSERV.UARK.EDU
 Sent: Thu, 3 March, 2011 7:24:09 AM
 Subject: Re: Capacity Monitoring question

 Sorry,  Didn't know protocol.

 Nick Warren - freelance consultant.  New to z/VM - experience with AIX, HPUX,
 MVS, Windows.
 
  Date: Wed, 2 Mar 2011 20:43:35 +
  From: scott.rohl...@gmail.com
  Subject: Re: Capacity Monitoring question
  To: IBMVM@LISTSERV.UARK.EDU
 
  If you want help - you're gonna have to introduce yourself...
 
  Scott Rohling
 
  On Mar 2, 2011 12:22pm, new zvm wrote:
   I'm relative new to z/VM. I have a couple of z/VM LPARs runing Linux
  guests and have more coming.
  
  
  
   What I'm wondering is this:
  
  
  
   What z/VM metrics are you monitoring and what thresholds do you use
  as indicators that more capacity is needed - Specifically CPU and
  Memory.
  
  
  
   Thanks in advance
  
  
  
   New 2 zVM
  
  
  
  
  
  



  

Re: Capacity Monitoring question

2011-03-02 Thread Berry van Sleeuwen
Hi Nick,

We monitor VM on page usage and page IO, our guest on VM for Queue and
storage usage (main, xstor and swap). Also we monitor guest CPU usage
and metrics like the limit list. Linux memory is always at 100% so no
sense in monitoring over there but we do monitor swap usage. Linux CPU
gives bad numbers to start with (yes even on current kernel levels they
are still wrong) so don't monitor CPU on the guests.

Actually, 100% CPU is not a bad thing at all. Where most OS-ses become
less responsive above 90% z/VM will still give you good response even at
high numbers. We like to have it above 90%. Obviously you would need
some capacity for new guests. So when you are running 100% CPU all the
time there can be a case for an additional IFL. But also look at the
guests, determine if they are running processes you don't need or that
hurt overal performance. Watch your linux guests on responsetimes and
batch runtimes. Set a good relative share and if that doesn't help you
could consider adding IFL's.

Keep VM paging below 50%, add paging DASD when needed. We have a VM that
is overcommitted to 9:1. Our production Linux VM is at 2:1 with room to
spare. Expect even high page IO rates, 1000's IO/sec don't have to be
bad. Keep an eye on guests that are competing for storage. Especially
loading users and E-lists can point to a resource problem. Try to fix it
on the guest first (eliminate processes, reduce memory sizes etc).

Make sure the guests don't stay in Q3. It will hurt other servers. So
eliminate unused processes, don't use pings or other keep alive tooling.
Be aware that most regular linux tooling keeps the guest active.
Obviously when you are running batch the guest will stay in Q3 but then
it's in there for a reason.

Some of these issues are also covered in the linux-390 list
(http://www2.marist.edu/htbin/wlvindex?LINUX-390). Take a look over
there also.

Regards, Berry.

Op 02-03-11 23:28, Nick Warren schreef:
 Hi Tony, Thanks for the response.

 I probably didn't ask the question(s) very well.  I'm working with a customer 
 that has no capacity plan regarding the use of z/VM as a linux host.  We're 
 seeing both CPU and Memory usage on the z/VM side increasing.  Performance on 
 the linux guests is acceptable at this time.

 Aside from waiting for the linux users to start complaining - what metrics 
 and thresholds should I be tracking as early predictors of capacity problems?

 Obviously if CPU usage is constantly 100% that's probably not good.  I'm 
 currently watching CPU, IOWait and Stolen time but wonder if those are 
 sufficient.  Any suggestion as what a good maximum number is?

 Memory is a larger concern - In a previous life as a mvs sysprog I would 
 watch paging/swapping and delay times among others.  Are there any rules of 
 thumb regarding paging or swapping in z/VM?  Is there something better that 
 paging/swapping for capacity prediction?

 Thanks again,

 Nick

 
   
 Date: Wed, 2 Mar 2011 13:47:42 -0800
 From: generalemailli...@yahoo.com.au
 Subject: Re: Capacity Monitoring question
 To: IBMVM@LISTSERV.UARK.EDU

 We use Performance Toolkit with APPLDATA enabled, then from option 29 in Perf
 toolkit we get

Linux screens selection
 S Display Description
 . LINUX   RMF PM system selection menu
 . LXCPU   Summary CPU activity display
 . LXMEM   Summary memory util.  activity display
 . LXNETWRKSummary network activity display

  Interval 02:11:28-08:44:10, on 2011/03/03  (CURRENT interval, select 
 interim or
 average data)
 __  .  . .  . . .   .  . .  .
 .   . . .  .  .

--- Total CPU ---
 - Processes --
 LinuxVirt  Utilization (%)  
 
 Current - -Average Running- Nr of
 Userid  CPUs TotCPU  User Kernel  Nice   IRQ SoftIRQ IOWait  Idle Stolen 
 Runabl
 Waiting Total 1_Min  5_Min 15_Min Users
 
 System  2.04.4   2.31.9.0.0  .1 .9 193.21.6
 2.0  .0 434.5   .08.15.12 4
   
 DLVOMG012 .4.2 .2.0.0  .0 .2 198.8 .6
 2   0   215   .00.00.00

  Interval 02:11:28-08:44:10, on 2011/03/03  (CURRENT interval, select 
 interim or
 average data)
 __  .  .   .  .  .   .  .   .  .
 . . . . . . .

 Memory Allocation (MB) - --- 
 Swapping
 --- --- Pages/s --- -BlockIO-
 Linux --- Main --- --- High ---Buffers  Cache -Space (MB)-
 -Pgs/sec-  Allo -Faults-- --kB/sec- Nr of
 Userid  M_Total %MUsed H_Total %HUsed Shared /CaFree   Used S_Total %SUsed
 In   Out cates Major Minor  Read Write Users
 
 System 3516   98.2  .0 .0 .0   240.7   18551744 .1
 .000  .000 331.1  .000 916.1 43.57 37.53 4