Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-24 Thread Agblad Tore
You can try getting more control of the environment. 
We don't install all these 'Unix/Linux' std packages in zLinux, because they
don't fit in, or give inaccurate data.
CPU load for example, we get that from z/VM instead, and our arguments
to the organisation here is bought.
We select appropriate stuff to monitor
that is vaild and works without bloating the cpu to much.
Yes, that is a balance, and we always try to minimize things, and just as 
said in this forum: we really need to think differently.
And it is also true, we now starts getting company from other virtual
environments than run into problems with resources.

So time is working for us :)


___
Tore Agblad
Volvo Information Technology
Infrastructure Mainframe Design  Development, Linux servers
Dept 4352  DA1S 
SE-405 08, Gothenburg  Sweden

Telephone: +46-31-3233569
E-mail: tore.agb...@volvo.com

http://www.volvo.com/volvoit/global/en-gb/

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Rob van 
der Heij
Sent: den 20 augusti 2010 23:08
To: LINUX-390@VM.MARIST.EDU
Subject: Re: How to convince others. Was: Re: mono keep guest active - ban the 
blips.

On Fri, Aug 20, 2010 at 12:40 AM, Berry van Sleeuwen
berry.vansleeu...@xs4all.nl wrote:

 Nagios is in use at the server side. Each client (our servers) has the
 nagios client, with scipting instead of the nagios plugins, and sec.

While parts of the Nagios user interface are pretty slick, it just
does not scale. While the rather simple architecture does not help,
the real problem appears to be in the admins who keep adding
additional checks. You can do a lot of silly things on discrete
servers with 5% avg utilization, but that does not mean it is a smart
thing to do in a shared resource environment.

 Sec is in use for monitoring the /var/log/messages, it makes the server
 go into Q3 and stay there and has quite some CPU load as well. Usefull,
 I don't know, perhaps but why brun so many cycles and keep busy all the
 time? I mean, how many message can you write and consequently read? At
 least when we monitor the linux console with PROP we won't have that
 much overhead.

It's probably polling with a very short delay while reading the open
file. Obviously it could have used a much longer delay. Which still is
pretty silly when nothing is happening in the system that writes data
into the log file.

You could be off worse. We ran into a commercial product that used
this to start a new log file at midnight:
 - sleep until 23:59:59
 - while time()  00:00 do ;
You probably figure why this process went into a busy wait for 24 hours ...

We have used SCIF to route the Linux console logging into a PROP-like
service that checked for bad things and also allowed trusted processes
to issue privileged commmands on the Linux guests. That's cheaper and
does not keep the Linux guest awake.

 The other part is scripting scheduled in cron to monitor the filesystem
 and processes. They tend to run at the same time for all servers and
 have some CPU load as well. I did notice the mon_fsstat and such, that
 only have minor impact on the linuxsystem and they even write records
 every minute. So in this case, usefull yes, but at a cost.

So if you have monitor data telling you almost nothing was written to
disk, does it still make sense to frequently run commands to check
whether the file systems filled up? Similar reasoning for checking
installed software levels - if you know nobody issued privileged
commands since last time, why check again?

Some of this really requires a different way of thinking. Not all the
teams that currently deploy a few Linux servers can make that change.
If they can't, it really hurts to let them dictate how one should
manage an order of magnitude more servers...

--
Rob van der Heij
Velocity Software
http://www.velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread van Sleeuwen, Berry
You know it, I know it. But some people tend to believe only what they
*think* they know. In this case unfortunalty the monitoring team is
regarded as the specialist and I'm 'only' a VM sysprog. I have proven *)
on several occasions that the numbers are off, in some case even way off
but still they are convinced the tooling on linux is telling the truth.
It is hard to convice management that our VM numbers are more correct
when so-called specialists only narrow their view to a single guest.
Especially the blipping thing is so hard to explain when everbody else
is telling that they don't see anything wrong. (nothing wrong, no
problem, so stop complaining). So therefore my question, how to convice
them in a way I didn't think of (yet).

*) I once did an install in a small LPAR (small in CPU resources that
is, storage was enough). The LPAR had so little MIPS available that any
linuxactivity quickly  drove the real CPU to 100%. Next, 1 linuxguest
was running an install. The other 2 linuxguests were idle or next to
idle. The performance toolkit revealed that 1 server was running over
90%. The other two at 0.2%. The two linux guests themselves however
report they were both running at 100% CPU. While only the one other
guest was truly running at next to 100%. As long as the LPAR isn't
running at full load the numbers keep more or less in line with the
truth. But once CP is deciding who gets the resources linux is clueless
as to what it's actual resource usage is.

Berry. 

-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of
Rich Smrcina
Sent: vrijdag 20 augustus 2010 1:39
To: LINUX-390@VM.MARIST.EDU
Subject: Re: How to convince others. Was: Re: mono keep guest active -
ban the blips.



When your monitoring department looks at top, vmstat and sar to detect
problems, don't forget the kernel numbers lie.  Even the new steal timer
is a little off.



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
ÿþDit bericht is vertrouwelijk en kan 
geheime informatie bevatten enkel

bestemd voor de geadresseerde. Indien 
dit bericht niet voor u is bestemd,

verzoeken wij u dit onmiddellijk aan 
ons te melden en het bericht te

vernietigen.

Aangezien de integriteit van het 
bericht niet veilig gesteld is middels

verzending via internet, kan Atos 
Origin niet aansprakelijk worden 
gehouden

voor de inhoud daarvan.

Hoewel wij ons inspannen een virusvrij 
netwerk te hanteren, geven

wij geen enkele garantie dat dit 
bericht virusvrij is, noch aanvaarden 
wij

enige aansprakelijkheid voor de 
mogelijke aanwezigheid van een virus in 
dit

bericht.

 

Op al onze rechtsverhoudingen, 
aanbiedingen en overeenkomsten 
waaronder

Atos Origin goederen en/of diensten 
levert zijn met uitsluiting van alle

andere voorwaarden de 
Leveringsvoorwaarden van Atos Origin 
van toepassing.

Deze worden u op aanvraag direct 
kosteloos toegezonden.

 

This e-mail and the documents attached 
are confidential and intended solely

for the addressee; it may also be 
privileged. If you receive this e-mail

in error, please notify the sender 
immediately and destroy it.

As its integrity cannot be secured on 
the Internet, the Atos Origin group

liability cannot be triggered for the 
message content. Although the

sender endeavours to maintain a 
computer 

Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread David Boyes
 If only the monitor could 'know' that the machine was running this
 batch load at a
 certain time of day and had an absolute share and was running 100% for
 an extended
 period of time.  It could be set up to not sent out alerts based on all
 of these
 criteria.  Wow!  That would be a very nice feature.

Nagios 3 has that feature. 

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread Rich Smrcina

 It's smart enough to know that *z/VM* has allocated it an absolute share?

On 08/20/2010 05:13 AM, David Boyes wrote:

If only the monitor could 'know' that the machine was running this
batch load at a
certain time of day and had an absolute share and was running 100% for
an extended
period of time.  It could be set up to not sent out alerts based on all
of these
criteria.  Wow!  That would be a very nice feature.

Nagios 3 has that feature.





--
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2011 - April 15-19, 2011 Colorado Springs, CO

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread David Boyes
   It's smart enough to know that *z/VM* has allocated it an absolute
 share?

It does have the ability to set time of day/shift-based parameters. As to the 
z/VM part, come to OLF and see. 8-)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread Rogério Soares
David,

i'm confuse now... nagios 3 will be able to comunicate with zvm directely
or you talking about a especific plugin using vmcp ou something like this ?
Sorry if i ask something obvious...


On Fri, Aug 20, 2010 at 11:12 AM, David Boyes dbo...@sinenomine.net wrote:

It's smart enough to know that *z/VM* has allocated it an absolute
  share?

 It does have the ability to set time of day/shift-based parameters. As to
 the z/VM part, come to OLF and see. 8-)

 --
 For LINUX-390 subscribe / signoff / archive access instructions,
 send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
 visit
 http://www.marist.edu/htbin/wlvindex?LINUX-390
 --
 For more information on Linux on System z, visit
 http://wiki.linuxvm.org/


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread Rogério Soares
forget David.. i figured out now...

2010/8/20 Rogério Soares rogerio.soa...@gmail.com

 David,

 i'm confuse now... nagios 3 will be able to comunicate with zvm directely
 or you talking about a especific plugin using vmcp ou something like this ?
 Sorry if i ask something obvious...



 On Fri, Aug 20, 2010 at 11:12 AM, David Boyes dbo...@sinenomine.netwrote:

It's smart enough to know that *z/VM* has allocated it an absolute
  share?

 It does have the ability to set time of day/shift-based parameters. As to
 the z/VM part, come to OLF and see. 8-)

 --
 For LINUX-390 subscribe / signoff / archive access instructions,
 send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
 visit
 http://www.marist.edu/htbin/wlvindex?LINUX-390
 --
 For more information on Linux on System z, visit
 http://wiki.linuxvm.org/




--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-20 Thread Rob van der Heij
On Fri, Aug 20, 2010 at 12:40 AM, Berry van Sleeuwen
berry.vansleeu...@xs4all.nl wrote:

 Nagios is in use at the server side. Each client (our servers) has the
 nagios client, with scipting instead of the nagios plugins, and sec.

While parts of the Nagios user interface are pretty slick, it just
does not scale. While the rather simple architecture does not help,
the real problem appears to be in the admins who keep adding
additional checks. You can do a lot of silly things on discrete
servers with 5% avg utilization, but that does not mean it is a smart
thing to do in a shared resource environment.

 Sec is in use for monitoring the /var/log/messages, it makes the server
 go into Q3 and stay there and has quite some CPU load as well. Usefull,
 I don't know, perhaps but why brun so many cycles and keep busy all the
 time? I mean, how many message can you write and consequently read? At
 least when we monitor the linux console with PROP we won't have that
 much overhead.

It's probably polling with a very short delay while reading the open
file. Obviously it could have used a much longer delay. Which still is
pretty silly when nothing is happening in the system that writes data
into the log file.

You could be off worse. We ran into a commercial product that used
this to start a new log file at midnight:
 - sleep until 23:59:59
 - while time()  00:00 do ;
You probably figure why this process went into a busy wait for 24 hours ...

We have used SCIF to route the Linux console logging into a PROP-like
service that checked for bad things and also allowed trusted processes
to issue privileged commmands on the Linux guests. That's cheaper and
does not keep the Linux guest awake.

 The other part is scripting scheduled in cron to monitor the filesystem
 and processes. They tend to run at the same time for all servers and
 have some CPU load as well. I did notice the mon_fsstat and such, that
 only have minor impact on the linuxsystem and they even write records
 every minute. So in this case, usefull yes, but at a cost.

So if you have monitor data telling you almost nothing was written to
disk, does it still make sense to frequently run commands to check
whether the file systems filled up? Similar reasoning for checking
installed software levels - if you know nobody issued privileged
commands since last time, why check again?

Some of this really requires a different way of thinking. Not all the
teams that currently deploy a few Linux servers can make that change.
If they can't, it really hurts to let them dictate how one should
manage an order of magnitude more servers...

--
Rob van der Heij
Velocity Software
http://www.velocitysoftware.com/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Berry van Sleeuwen
That's a good way to make things clear. Especially to management.

Here is a challenge. We are in the process of enrolling new machines
into production. Part of that is that they want to force us to install a
general monitoring tool (nagios and local scripting). We noticed quite a
dramatic increase in resource usage. CPU at least doubles and the guests
all go to Q3. Upon our comments on wasting resources, poorer storage
handling etc. management responds so then we have to buy storage. So
we now have to write a bussinesscase why we NOT should increase storage
to handle the load. What are convincing arguments? After a few years of
discussing this over and over again I'm out of ideas.

Thanks, Berry.


Op 17-08-10 23:35, Barton Robinson schreef:
 The reason these blips are so virtual unfriendly - think about poor
 old z/vm storage management. We need to steal some pages for some real
 work going on.  Do we steal it from the server doing real
 transactions? or from the one that is blipping? oops, we can't tell
 the difference.


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread David Kreuter
Are Nagios and local scripts waking up needlessly? or are they doing
legitimate work even if it is wasteful?
David Kreuter


 Original Message 
Subject: How to convince others. Was: Re: mono keep guest active - ban
the blips.
From: Berry van Sleeuwen berry.vansleeu...@xs4all.nl
Date: Thu, August 19, 2010 3:49 pm
To: LINUX-390@VM.MARIST.EDU

That's a good way to make things clear. Especially to management.

Here is a challenge. We are in the process of enrolling new machines
into production. Part of that is that they want to force us to install a
general monitoring tool (nagios and local scripting). We noticed quite a
dramatic increase in resource usage. CPU at least doubles and the guests
all go to Q3. Upon our comments on wasting resources, poorer storage
handling etc. management responds so then we have to buy storage. So
we now have to write a bussinesscase why we NOT should increase storage
to handle the load. What are convincing arguments? After a few years of
discussing this over and over again I'm out of ideas.

Thanks, Berry.


Op 17-08-10 23:35, Barton Robinson schreef:
 The reason these blips are so virtual unfriendly - think about poor
 old z/vm storage management. We need to steal some pages for some real
 work going on. Do we steal it from the server doing real
 transactions? or from the one that is blipping? oops, we can't tell
 the difference.


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Rich Smrcina

 A 'general monitoring tool' is not a performance monitor.  In an environment 
where
efficient resource utilization is critical to the business, a means to monitor:

- the performance of the virtual machine environment
- the virtual machines running in that environment
- potentially systems outboard from the environment

Is paramount to a successful implementation on System z.  Additionally you may 
want to
perform chargeback and accounting based on internal procedures that may be in 
place.

Nagios doesn't provide the timing resolution or access to z/VM monitoring 
resources, so
it loses.

On 08/19/2010 02:49 PM, Berry van Sleeuwen wrote:

That's a good way to make things clear. Especially to management.

Here is a challenge. We are in the process of enrolling new machines
into production. Part of that is that they want to force us to install a
general monitoring tool (nagios and local scripting). We noticed quite a
dramatic increase in resource usage. CPU at least doubles and the guests
all go to Q3. Upon our comments on wasting resources, poorer storage
handling etc. management responds so then we have to buy storage. So
we now have to write a bussinesscase why we NOT should increase storage
to handle the load. What are convincing arguments? After a few years of
discussing this over and over again I'm out of ideas.

Thanks, Berry.


Op 17-08-10 23:35, Barton Robinson schreef:

The reason these blips are so virtual unfriendly - think about poor
old z/vm storage management. We need to steal some pages for some real
work going on.  Do we steal it from the server doing real
transactions? or from the one that is blipping? oops, we can't tell
the difference.


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/





--
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2011 - April 15-19, 2011 Colorado Springs, CO

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Berry van Sleeuwen
Nagios is in use at the server side. Each client (our servers) has the
nagios client, with scipting instead of the nagios plugins, and sec.

Sec is in use for monitoring the /var/log/messages, it makes the server
go into Q3 and stay there and has quite some CPU load as well. Usefull,
I don't know, perhaps but why brun so many cycles and keep busy all the
time? I mean, how many message can you write and consequently read? At
least when we monitor the linux console with PROP we won't have that
much overhead.

The other part is scripting scheduled in cron to monitor the filesystem
and processes. They tend to run at the same time for all servers and
have some CPU load as well. I did notice the mon_fsstat and such, that
only have minor impact on the linuxsystem and they even write records
every minute. So in this case, usefull yes, but at a cost.

Berry.

Op 19-08-10 22:04, David Kreuter schreef:
 Are Nagios and local scripts waking up needlessly? or are they doing
 legitimate work even if it is wasteful?
 David Kreuter


  Original Message 
 Subject: How to convince others. Was: Re: mono keep guest active - ban
 the blips.
 From: Berry van Sleeuwen berry.vansleeu...@xs4all.nl
 Date: Thu, August 19, 2010 3:49 pm
 To: LINUX-390@VM.MARIST.EDU

 That's a good way to make things clear. Especially to management.

 Here is a challenge. We are in the process of enrolling new machines
 into production. Part of that is that they want to force us to install a
 general monitoring tool (nagios and local scripting). We noticed quite a
 dramatic increase in resource usage. CPU at least doubles and the guests
 all go to Q3. Upon our comments on wasting resources, poorer storage
 handling etc. management responds so then we have to buy storage. So
 we now have to write a bussinesscase why we NOT should increase storage
 to handle the load. What are convincing arguments? After a few years of
 discussing this over and over again I'm out of ideas.

 Thanks, Berry.


 Op 17-08-10 23:35, Barton Robinson schreef:

 The reason these blips are so virtual unfriendly - think about poor
 old z/vm storage management. We need to steal some pages for some real
 work going on. Do we steal it from the server doing real
 transactions? or from the one that is blipping? oops, we can't tell
 the difference.


 --
 For LINUX-390 subscribe / signoff / archive access instructions,
 send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
 visit
 http://www.marist.edu/htbin/wlvindex?LINUX-390
 --
 For more information on Linux on System z, visit
 http://wiki.linuxvm.org/

 --
 For LINUX-390 subscribe / signoff / archive access instructions,
 send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
 http://www.marist.edu/htbin/wlvindex?LINUX-390
 --
 For more information on Linux on System z, visit
 http://wiki.linuxvm.org/




--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Berry van Sleeuwen
True, it isn't. It's the replacement of an operator. The main issue here
is that it needs to raise tickets and get reporting stats. For instance,
raise a ticket at 100% CPU (and indeed, our ABS limithard machines do
raise tickets when they are running their batch..sigh.) or when a
filesystem is at 100%. The reporting is for instance on CPU and
filesystem usage.

But indeed it can't provide insight in the performance of a guest, other
than detect thresholds. And it doesn't have to either, the monitoring
department can look at top, vmstat or sar to detect that kind of
problems should they need to (yeah right, then they know all about the
entire environment).

Still, as for a case, this is a good point. We need to be able to
address performance related monitoring and nagios can't do that. Or at
least not within the scope of an entire LPAR.

Thanks, Berry.

Op 19-08-10 22:12, Rich Smrcina schreef:
  A 'general monitoring tool' is not a performance monitor.  In an
 environment where
 efficient resource utilization is critical to the business, a means to
 monitor:

 - the performance of the virtual machine environment
 - the virtual machines running in that environment
 - potentially systems outboard from the environment

 Is paramount to a successful implementation on System z.  Additionally
 you may want to
 perform chargeback and accounting based on internal procedures that
 may be in place.

 Nagios doesn't provide the timing resolution or access to z/VM
 monitoring resources, so
 it loses.



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Rich Smrcina

 If your batch runs regularly or consistently drive some virtual machines to 
100% this
may not signal a loop condition (which, I would guess, is why the ticket is 
being
raised).  Techs may grow conditioned to this and either take longer to respond 
or just
outright 'ignore' the tickets eventually, since the 'normal' course of action 
is to page
for a condition that is unresolvable without a larger share, or redistribution 
of the load.

If only the monitor could 'know' that the machine was running this batch load 
at a
certain time of day and had an absolute share and was running 100% for an 
extended
period of time.  It could be set up to not sent out alerts based on all of these
criteria.  Wow!  That would be a very nice feature.

When your monitoring department looks at top, vmstat and sar to detect 
problems, don't
forget the kernel numbers lie.  Even the new steal timer is a little off.


On 08/19/2010 05:51 PM, Berry van Sleeuwen wrote:

True, it isn't. It's the replacement of an operator. The main issue here
is that it needs to raise tickets and get reporting stats. For instance,
raise a ticket at 100% CPU (and indeed, our ABS limithard machines do
raise tickets when they are running their batch..sigh.) or when a
filesystem is at 100%. The reporting is for instance on CPU and
filesystem usage.

But indeed it can't provide insight in the performance of a guest, other
than detect thresholds. And it doesn't have to either, the monitoring
department can look at top, vmstat or sar to detect that kind of
problems should they need to (yeah right, then they know all about the
entire environment).

Still, as for a case, this is a good point. We need to be able to
address performance related monitoring and nagios can't do that. Or at
least not within the scope of an entire LPAR.

Thanks, Berry.



--
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2011 - April 15-19, 2011 Colorado Springs, CO

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Rogério Soares
Berry,

to monitor some stats of lpar using nagios, we set up a machine with
high class level, and make some scripts to use vmcp module to query and
filter informations... i have sure that is not the best way, but, some times
we need improvise :-)

On Thu, Aug 19, 2010 at 7:51 PM, Berry van Sleeuwen 
berry.vansleeu...@xs4all.nl wrote:

 True, it isn't. It's the replacement of an operator. The main issue here
 is that it needs to raise tickets and get reporting stats. For instance,
 raise a ticket at 100% CPU (and indeed, our ABS limithard machines do
 raise tickets when they are running their batch..sigh.) or when a
 filesystem is at 100%. The reporting is for instance on CPU and
 filesystem usage.

 But indeed it can't provide insight in the performance of a guest, other
 than detect thresholds. And it doesn't have to either, the monitoring
 department can look at top, vmstat or sar to detect that kind of
 problems should they need to (yeah right, then they know all about the
 entire environment).

 Still, as for a case, this is a good point. We need to be able to
 address performance related monitoring and nagios can't do that. Or at
 least not within the scope of an entire LPAR.

 Thanks, Berry.

 Op 19-08-10 22:12, Rich Smrcina schreef:
   A 'general monitoring tool' is not a performance monitor.  In an
  environment where
  efficient resource utilization is critical to the business, a means to
  monitor:
 
  - the performance of the virtual machine environment
  - the virtual machines running in that environment
  - potentially systems outboard from the environment
 
  Is paramount to a successful implementation on System z.  Additionally
  you may want to
  perform chargeback and accounting based on internal procedures that
  may be in place.
 
  Nagios doesn't provide the timing resolution or access to z/VM
  monitoring resources, so
  it loses.
 
 

 --
 For LINUX-390 subscribe / signoff / archive access instructions,
 send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or
 visit
 http://www.marist.edu/htbin/wlvindex?LINUX-390
 --
 For more information on Linux on System z, visit
 http://wiki.linuxvm.org/


--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

2010-08-19 Thread Marcy Cortes
It'd be even cooler if your monitor could learn a virtual machines normal or 
expected activity pattern by time of day / day of week and the signal things 
out of the ordinary.  Like the batch activity that was supposed to have been 
running but took an unexpected low address protection exception and cpu dived 
to .5% or the online server whose new code release put them into an occasional 
loop and chewed an engine for a while.  (real world examples from oh the last 3 
weeks :).

The business of triggering on error messages is always a reactive thing.  You 
get a message, you have a big problem because bad messsage went unnoticed for 
hours and something on down the line failed, people play cleanup.  You add 
paging automation around that message for the next time... 

All of this systems automation software could be a lot smarter... 


Marcy 



-Original Message-
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Rich 
Smrcina
Sent: Thursday, August 19, 2010 4:39 PM
To: LINUX-390@vm.marist.edu
Subject: Re: [LINUX-390] How to convince others. Was: Re: mono keep guest 
active - ban the blips.

  If your batch runs regularly or consistently drive some virtual machines to 
100% this
may not signal a loop condition (which, I would guess, is why the ticket is 
being
raised).  Techs may grow conditioned to this and either take longer to respond 
or just
outright 'ignore' the tickets eventually, since the 'normal' course of action 
is to page
for a condition that is unresolvable without a larger share, or redistribution 
of the load.

If only the monitor could 'know' that the machine was running this batch load 
at a
certain time of day and had an absolute share and was running 100% for an 
extended
period of time.  It could be set up to not sent out alerts based on all of these
criteria.  Wow!  That would be a very nice feature.

When your monitoring department looks at top, vmstat and sar to detect 
problems, don't
forget the kernel numbers lie.  Even the new steal timer is a little off.


On 08/19/2010 05:51 PM, Berry van Sleeuwen wrote:
 True, it isn't. It's the replacement of an operator. The main issue here
 is that it needs to raise tickets and get reporting stats. For instance,
 raise a ticket at 100% CPU (and indeed, our ABS limithard machines do
 raise tickets when they are running their batch..sigh.) or when a
 filesystem is at 100%. The reporting is for instance on CPU and
 filesystem usage.

 But indeed it can't provide insight in the performance of a guest, other
 than detect thresholds. And it doesn't have to either, the monitoring
 department can look at top, vmstat or sar to detect that kind of
 problems should they need to (yeah right, then they know all about the
 entire environment).

 Still, as for a case, this is a good point. We need to be able to
 address performance related monitoring and nagios can't do that. Or at
 least not within the scope of an entire LPAR.

 Thanks, Berry.


--
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2011 - April 15-19, 2011 Colorado Springs, CO

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/