Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
You can try getting more control of the environment. We don't install all these 'Unix/Linux' std packages in zLinux, because they don't fit in, or give inaccurate data. CPU load for example, we get that from z/VM instead, and our arguments to the organisation here is bought. We select appropriate stuff to monitor that is vaild and works without bloating the cpu to much. Yes, that is a balance, and we always try to minimize things, and just as said in this forum: we really need to think differently. And it is also true, we now starts getting company from other virtual environments than run into problems with resources. So time is working for us :) ___ Tore Agblad Volvo Information Technology Infrastructure Mainframe Design Development, Linux servers Dept 4352 DA1S SE-405 08, Gothenburg Sweden Telephone: +46-31-3233569 E-mail: tore.agb...@volvo.com http://www.volvo.com/volvoit/global/en-gb/ -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Rob van der Heij Sent: den 20 augusti 2010 23:08 To: LINUX-390@VM.MARIST.EDU Subject: Re: How to convince others. Was: Re: mono keep guest active - ban the blips. On Fri, Aug 20, 2010 at 12:40 AM, Berry van Sleeuwen berry.vansleeu...@xs4all.nl wrote: Nagios is in use at the server side. Each client (our servers) has the nagios client, with scipting instead of the nagios plugins, and sec. While parts of the Nagios user interface are pretty slick, it just does not scale. While the rather simple architecture does not help, the real problem appears to be in the admins who keep adding additional checks. You can do a lot of silly things on discrete servers with 5% avg utilization, but that does not mean it is a smart thing to do in a shared resource environment. Sec is in use for monitoring the /var/log/messages, it makes the server go into Q3 and stay there and has quite some CPU load as well. Usefull, I don't know, perhaps but why brun so many cycles and keep busy all the time? I mean, how many message can you write and consequently read? At least when we monitor the linux console with PROP we won't have that much overhead. It's probably polling with a very short delay while reading the open file. Obviously it could have used a much longer delay. Which still is pretty silly when nothing is happening in the system that writes data into the log file. You could be off worse. We ran into a commercial product that used this to start a new log file at midnight: - sleep until 23:59:59 - while time() 00:00 do ; You probably figure why this process went into a busy wait for 24 hours ... We have used SCIF to route the Linux console logging into a PROP-like service that checked for bad things and also allowed trusted processes to issue privileged commmands on the Linux guests. That's cheaper and does not keep the Linux guest awake. The other part is scripting scheduled in cron to monitor the filesystem and processes. They tend to run at the same time for all servers and have some CPU load as well. I did notice the mon_fsstat and such, that only have minor impact on the linuxsystem and they even write records every minute. So in this case, usefull yes, but at a cost. So if you have monitor data telling you almost nothing was written to disk, does it still make sense to frequently run commands to check whether the file systems filled up? Similar reasoning for checking installed software levels - if you know nobody issued privileged commands since last time, why check again? Some of this really requires a different way of thinking. Not all the teams that currently deploy a few Linux servers can make that change. If they can't, it really hurts to let them dictate how one should manage an order of magnitude more servers... -- Rob van der Heij Velocity Software http://www.velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
You know it, I know it. But some people tend to believe only what they *think* they know. In this case unfortunalty the monitoring team is regarded as the specialist and I'm 'only' a VM sysprog. I have proven *) on several occasions that the numbers are off, in some case even way off but still they are convinced the tooling on linux is telling the truth. It is hard to convice management that our VM numbers are more correct when so-called specialists only narrow their view to a single guest. Especially the blipping thing is so hard to explain when everbody else is telling that they don't see anything wrong. (nothing wrong, no problem, so stop complaining). So therefore my question, how to convice them in a way I didn't think of (yet). *) I once did an install in a small LPAR (small in CPU resources that is, storage was enough). The LPAR had so little MIPS available that any linuxactivity quickly drove the real CPU to 100%. Next, 1 linuxguest was running an install. The other 2 linuxguests were idle or next to idle. The performance toolkit revealed that 1 server was running over 90%. The other two at 0.2%. The two linux guests themselves however report they were both running at 100% CPU. While only the one other guest was truly running at next to 100%. As long as the LPAR isn't running at full load the numbers keep more or less in line with the truth. But once CP is deciding who gets the resources linux is clueless as to what it's actual resource usage is. Berry. -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Rich Smrcina Sent: vrijdag 20 augustus 2010 1:39 To: LINUX-390@VM.MARIST.EDU Subject: Re: How to convince others. Was: Re: mono keep guest active - ban the blips. When your monitoring department looks at top, vmstat and sar to detect problems, don't forget the kernel numbers lie. Even the new steal timer is a little off. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ ÿþD i t b e r i c h t i s v e r t r o u w e l i j k e n k a n g e h e i m e i n f o r m a t i e b e v a t t e n e n k e l b e s t e m d v o o r d e g e a d r e s s e e r d e . I n d i e n d i t b e r i c h t n i e t v o o r u i s b e s t e m d , v e r z o e k e n w i j u d i t o n m i d d e l l i j k a a n o n s t e m e l d e n e n h e t b e r i c h t t e v e r n i e t i g e n . A a n g e z i e n d e i n t e g r i t e i t v a n h e t b e r i c h t n i e t v e i l i g g e s t e l d i s m i d d e l s v e r z e n d i n g v i a i n t e r n e t , k a n A t o s O r i g i n n i e t a a n s p r a k e l i j k w o r d e n g e h o u d e n v o o r d e i n h o u d d a a r v a n . H o e w e l w i j o n s i n s p a n n e n e e n v i r u s v r i j n e t w e r k t e h a n t e r e n , g e v e n w i j g e e n e n k e l e g a r a n t i e d a t d i t b e r i c h t v i r u s v r i j i s , n o c h a a n v a a r d e n w i j e n i g e a a n s p r a k e l i j k h e i d v o o r d e m o g e l i j k e a a n w e z i g h e i d v a n e e n v i r u s i n d i t b e r i c h t . O p a l o n z e r e c h t s v e r h o u d i n g e n , a a n b i e d i n g e n e n o v e r e e n k o m s t e n w a a r o n d e r A t o s O r i g i n g o e d e r e n e n / o f d i e n s t e n l e v e r t z i j n m e t u i t s l u i t i n g v a n a l l e a n d e r e v o o r w a a r d e n d e L e v e r i n g s v o o r w a a r d e n v a n A t o s O r i g i n v a n t o e p a s s i n g . D e z e w o r d e n u o p a a n v r a a g d i r e c t k o s t e l o o s t o e g e z o n d e n . T h i s e - m a i l a n d t h e d o c u m e n t s a t t a c h e d a r e c o n f i d e n t i a l a n d i n t e n d e d s o l e l y f o r t h e a d d r e s s e e ; i t m a y a l s o b e p r i v i l e g e d . I f y o u r e c e i v e t h i s e - m a i l i n e r r o r , p l e a s e n o t i f y t h e s e n d e r i m m e d i a t e l y a n d d e s t r o y i t . A s i t s i n t e g r i t y c a n n o t b e s e c u r e d o n t h e I n t e r n e t , t h e A t o s O r i g i n g r o u p l i a b i l i t y c a n n o t b e t r i g g e r e d f o r t h e m e s s a g e c o n t e n t . A l t h o u g h t h e s e n d e r e n d e a v o u r s t o m a i n t a i n a c o m p u t e r
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
If only the monitor could 'know' that the machine was running this batch load at a certain time of day and had an absolute share and was running 100% for an extended period of time. It could be set up to not sent out alerts based on all of these criteria. Wow! That would be a very nice feature. Nagios 3 has that feature. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
It's smart enough to know that *z/VM* has allocated it an absolute share? On 08/20/2010 05:13 AM, David Boyes wrote: If only the monitor could 'know' that the machine was running this batch load at a certain time of day and had an absolute share and was running 100% for an extended period of time. It could be set up to not sent out alerts based on all of these criteria. Wow! That would be a very nice feature. Nagios 3 has that feature. -- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2011 - April 15-19, 2011 Colorado Springs, CO -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
It's smart enough to know that *z/VM* has allocated it an absolute share? It does have the ability to set time of day/shift-based parameters. As to the z/VM part, come to OLF and see. 8-) -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
David, i'm confuse now... nagios 3 will be able to comunicate with zvm directely or you talking about a especific plugin using vmcp ou something like this ? Sorry if i ask something obvious... On Fri, Aug 20, 2010 at 11:12 AM, David Boyes dbo...@sinenomine.net wrote: It's smart enough to know that *z/VM* has allocated it an absolute share? It does have the ability to set time of day/shift-based parameters. As to the z/VM part, come to OLF and see. 8-) -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
forget David.. i figured out now... 2010/8/20 Rogério Soares rogerio.soa...@gmail.com David, i'm confuse now... nagios 3 will be able to comunicate with zvm directely or you talking about a especific plugin using vmcp ou something like this ? Sorry if i ask something obvious... On Fri, Aug 20, 2010 at 11:12 AM, David Boyes dbo...@sinenomine.netwrote: It's smart enough to know that *z/VM* has allocated it an absolute share? It does have the ability to set time of day/shift-based parameters. As to the z/VM part, come to OLF and see. 8-) -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
On Fri, Aug 20, 2010 at 12:40 AM, Berry van Sleeuwen berry.vansleeu...@xs4all.nl wrote: Nagios is in use at the server side. Each client (our servers) has the nagios client, with scipting instead of the nagios plugins, and sec. While parts of the Nagios user interface are pretty slick, it just does not scale. While the rather simple architecture does not help, the real problem appears to be in the admins who keep adding additional checks. You can do a lot of silly things on discrete servers with 5% avg utilization, but that does not mean it is a smart thing to do in a shared resource environment. Sec is in use for monitoring the /var/log/messages, it makes the server go into Q3 and stay there and has quite some CPU load as well. Usefull, I don't know, perhaps but why brun so many cycles and keep busy all the time? I mean, how many message can you write and consequently read? At least when we monitor the linux console with PROP we won't have that much overhead. It's probably polling with a very short delay while reading the open file. Obviously it could have used a much longer delay. Which still is pretty silly when nothing is happening in the system that writes data into the log file. You could be off worse. We ran into a commercial product that used this to start a new log file at midnight: - sleep until 23:59:59 - while time() 00:00 do ; You probably figure why this process went into a busy wait for 24 hours ... We have used SCIF to route the Linux console logging into a PROP-like service that checked for bad things and also allowed trusted processes to issue privileged commmands on the Linux guests. That's cheaper and does not keep the Linux guest awake. The other part is scripting scheduled in cron to monitor the filesystem and processes. They tend to run at the same time for all servers and have some CPU load as well. I did notice the mon_fsstat and such, that only have minor impact on the linuxsystem and they even write records every minute. So in this case, usefull yes, but at a cost. So if you have monitor data telling you almost nothing was written to disk, does it still make sense to frequently run commands to check whether the file systems filled up? Similar reasoning for checking installed software levels - if you know nobody issued privileged commands since last time, why check again? Some of this really requires a different way of thinking. Not all the teams that currently deploy a few Linux servers can make that change. If they can't, it really hurts to let them dictate how one should manage an order of magnitude more servers... -- Rob van der Heij Velocity Software http://www.velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
How to convince others. Was: Re: mono keep guest active - ban the blips.
That's a good way to make things clear. Especially to management. Here is a challenge. We are in the process of enrolling new machines into production. Part of that is that they want to force us to install a general monitoring tool (nagios and local scripting). We noticed quite a dramatic increase in resource usage. CPU at least doubles and the guests all go to Q3. Upon our comments on wasting resources, poorer storage handling etc. management responds so then we have to buy storage. So we now have to write a bussinesscase why we NOT should increase storage to handle the load. What are convincing arguments? After a few years of discussing this over and over again I'm out of ideas. Thanks, Berry. Op 17-08-10 23:35, Barton Robinson schreef: The reason these blips are so virtual unfriendly - think about poor old z/vm storage management. We need to steal some pages for some real work going on. Do we steal it from the server doing real transactions? or from the one that is blipping? oops, we can't tell the difference. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
Are Nagios and local scripts waking up needlessly? or are they doing legitimate work even if it is wasteful? David Kreuter Original Message Subject: How to convince others. Was: Re: mono keep guest active - ban the blips. From: Berry van Sleeuwen berry.vansleeu...@xs4all.nl Date: Thu, August 19, 2010 3:49 pm To: LINUX-390@VM.MARIST.EDU That's a good way to make things clear. Especially to management. Here is a challenge. We are in the process of enrolling new machines into production. Part of that is that they want to force us to install a general monitoring tool (nagios and local scripting). We noticed quite a dramatic increase in resource usage. CPU at least doubles and the guests all go to Q3. Upon our comments on wasting resources, poorer storage handling etc. management responds so then we have to buy storage. So we now have to write a bussinesscase why we NOT should increase storage to handle the load. What are convincing arguments? After a few years of discussing this over and over again I'm out of ideas. Thanks, Berry. Op 17-08-10 23:35, Barton Robinson schreef: The reason these blips are so virtual unfriendly - think about poor old z/vm storage management. We need to steal some pages for some real work going on. Do we steal it from the server doing real transactions? or from the one that is blipping? oops, we can't tell the difference. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
A 'general monitoring tool' is not a performance monitor. In an environment where efficient resource utilization is critical to the business, a means to monitor: - the performance of the virtual machine environment - the virtual machines running in that environment - potentially systems outboard from the environment Is paramount to a successful implementation on System z. Additionally you may want to perform chargeback and accounting based on internal procedures that may be in place. Nagios doesn't provide the timing resolution or access to z/VM monitoring resources, so it loses. On 08/19/2010 02:49 PM, Berry van Sleeuwen wrote: That's a good way to make things clear. Especially to management. Here is a challenge. We are in the process of enrolling new machines into production. Part of that is that they want to force us to install a general monitoring tool (nagios and local scripting). We noticed quite a dramatic increase in resource usage. CPU at least doubles and the guests all go to Q3. Upon our comments on wasting resources, poorer storage handling etc. management responds so then we have to buy storage. So we now have to write a bussinesscase why we NOT should increase storage to handle the load. What are convincing arguments? After a few years of discussing this over and over again I'm out of ideas. Thanks, Berry. Op 17-08-10 23:35, Barton Robinson schreef: The reason these blips are so virtual unfriendly - think about poor old z/vm storage management. We need to steal some pages for some real work going on. Do we steal it from the server doing real transactions? or from the one that is blipping? oops, we can't tell the difference. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2011 - April 15-19, 2011 Colorado Springs, CO -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
Nagios is in use at the server side. Each client (our servers) has the nagios client, with scipting instead of the nagios plugins, and sec. Sec is in use for monitoring the /var/log/messages, it makes the server go into Q3 and stay there and has quite some CPU load as well. Usefull, I don't know, perhaps but why brun so many cycles and keep busy all the time? I mean, how many message can you write and consequently read? At least when we monitor the linux console with PROP we won't have that much overhead. The other part is scripting scheduled in cron to monitor the filesystem and processes. They tend to run at the same time for all servers and have some CPU load as well. I did notice the mon_fsstat and such, that only have minor impact on the linuxsystem and they even write records every minute. So in this case, usefull yes, but at a cost. Berry. Op 19-08-10 22:04, David Kreuter schreef: Are Nagios and local scripts waking up needlessly? or are they doing legitimate work even if it is wasteful? David Kreuter Original Message Subject: How to convince others. Was: Re: mono keep guest active - ban the blips. From: Berry van Sleeuwen berry.vansleeu...@xs4all.nl Date: Thu, August 19, 2010 3:49 pm To: LINUX-390@VM.MARIST.EDU That's a good way to make things clear. Especially to management. Here is a challenge. We are in the process of enrolling new machines into production. Part of that is that they want to force us to install a general monitoring tool (nagios and local scripting). We noticed quite a dramatic increase in resource usage. CPU at least doubles and the guests all go to Q3. Upon our comments on wasting resources, poorer storage handling etc. management responds so then we have to buy storage. So we now have to write a bussinesscase why we NOT should increase storage to handle the load. What are convincing arguments? After a few years of discussing this over and over again I'm out of ideas. Thanks, Berry. Op 17-08-10 23:35, Barton Robinson schreef: The reason these blips are so virtual unfriendly - think about poor old z/vm storage management. We need to steal some pages for some real work going on. Do we steal it from the server doing real transactions? or from the one that is blipping? oops, we can't tell the difference. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
True, it isn't. It's the replacement of an operator. The main issue here is that it needs to raise tickets and get reporting stats. For instance, raise a ticket at 100% CPU (and indeed, our ABS limithard machines do raise tickets when they are running their batch..sigh.) or when a filesystem is at 100%. The reporting is for instance on CPU and filesystem usage. But indeed it can't provide insight in the performance of a guest, other than detect thresholds. And it doesn't have to either, the monitoring department can look at top, vmstat or sar to detect that kind of problems should they need to (yeah right, then they know all about the entire environment). Still, as for a case, this is a good point. We need to be able to address performance related monitoring and nagios can't do that. Or at least not within the scope of an entire LPAR. Thanks, Berry. Op 19-08-10 22:12, Rich Smrcina schreef: A 'general monitoring tool' is not a performance monitor. In an environment where efficient resource utilization is critical to the business, a means to monitor: - the performance of the virtual machine environment - the virtual machines running in that environment - potentially systems outboard from the environment Is paramount to a successful implementation on System z. Additionally you may want to perform chargeback and accounting based on internal procedures that may be in place. Nagios doesn't provide the timing resolution or access to z/VM monitoring resources, so it loses. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
If your batch runs regularly or consistently drive some virtual machines to 100% this may not signal a loop condition (which, I would guess, is why the ticket is being raised). Techs may grow conditioned to this and either take longer to respond or just outright 'ignore' the tickets eventually, since the 'normal' course of action is to page for a condition that is unresolvable without a larger share, or redistribution of the load. If only the monitor could 'know' that the machine was running this batch load at a certain time of day and had an absolute share and was running 100% for an extended period of time. It could be set up to not sent out alerts based on all of these criteria. Wow! That would be a very nice feature. When your monitoring department looks at top, vmstat and sar to detect problems, don't forget the kernel numbers lie. Even the new steal timer is a little off. On 08/19/2010 05:51 PM, Berry van Sleeuwen wrote: True, it isn't. It's the replacement of an operator. The main issue here is that it needs to raise tickets and get reporting stats. For instance, raise a ticket at 100% CPU (and indeed, our ABS limithard machines do raise tickets when they are running their batch..sigh.) or when a filesystem is at 100%. The reporting is for instance on CPU and filesystem usage. But indeed it can't provide insight in the performance of a guest, other than detect thresholds. And it doesn't have to either, the monitoring department can look at top, vmstat or sar to detect that kind of problems should they need to (yeah right, then they know all about the entire environment). Still, as for a case, this is a good point. We need to be able to address performance related monitoring and nagios can't do that. Or at least not within the scope of an entire LPAR. Thanks, Berry. -- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2011 - April 15-19, 2011 Colorado Springs, CO -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
Berry, to monitor some stats of lpar using nagios, we set up a machine with high class level, and make some scripts to use vmcp module to query and filter informations... i have sure that is not the best way, but, some times we need improvise :-) On Thu, Aug 19, 2010 at 7:51 PM, Berry van Sleeuwen berry.vansleeu...@xs4all.nl wrote: True, it isn't. It's the replacement of an operator. The main issue here is that it needs to raise tickets and get reporting stats. For instance, raise a ticket at 100% CPU (and indeed, our ABS limithard machines do raise tickets when they are running their batch..sigh.) or when a filesystem is at 100%. The reporting is for instance on CPU and filesystem usage. But indeed it can't provide insight in the performance of a guest, other than detect thresholds. And it doesn't have to either, the monitoring department can look at top, vmstat or sar to detect that kind of problems should they need to (yeah right, then they know all about the entire environment). Still, as for a case, this is a good point. We need to be able to address performance related monitoring and nagios can't do that. Or at least not within the scope of an entire LPAR. Thanks, Berry. Op 19-08-10 22:12, Rich Smrcina schreef: A 'general monitoring tool' is not a performance monitor. In an environment where efficient resource utilization is critical to the business, a means to monitor: - the performance of the virtual machine environment - the virtual machines running in that environment - potentially systems outboard from the environment Is paramount to a successful implementation on System z. Additionally you may want to perform chargeback and accounting based on internal procedures that may be in place. Nagios doesn't provide the timing resolution or access to z/VM monitoring resources, so it loses. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: How to convince others. Was: Re: mono keep guest active - ban the blips.
It'd be even cooler if your monitor could learn a virtual machines normal or expected activity pattern by time of day / day of week and the signal things out of the ordinary. Like the batch activity that was supposed to have been running but took an unexpected low address protection exception and cpu dived to .5% or the online server whose new code release put them into an occasional loop and chewed an engine for a while. (real world examples from oh the last 3 weeks :). The business of triggering on error messages is always a reactive thing. You get a message, you have a big problem because bad messsage went unnoticed for hours and something on down the line failed, people play cleanup. You add paging automation around that message for the next time... All of this systems automation software could be a lot smarter... Marcy -Original Message- From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Rich Smrcina Sent: Thursday, August 19, 2010 4:39 PM To: LINUX-390@vm.marist.edu Subject: Re: [LINUX-390] How to convince others. Was: Re: mono keep guest active - ban the blips. If your batch runs regularly or consistently drive some virtual machines to 100% this may not signal a loop condition (which, I would guess, is why the ticket is being raised). Techs may grow conditioned to this and either take longer to respond or just outright 'ignore' the tickets eventually, since the 'normal' course of action is to page for a condition that is unresolvable without a larger share, or redistribution of the load. If only the monitor could 'know' that the machine was running this batch load at a certain time of day and had an absolute share and was running 100% for an extended period of time. It could be set up to not sent out alerts based on all of these criteria. Wow! That would be a very nice feature. When your monitoring department looks at top, vmstat and sar to detect problems, don't forget the kernel numbers lie. Even the new steal timer is a little off. On 08/19/2010 05:51 PM, Berry van Sleeuwen wrote: True, it isn't. It's the replacement of an operator. The main issue here is that it needs to raise tickets and get reporting stats. For instance, raise a ticket at 100% CPU (and indeed, our ABS limithard machines do raise tickets when they are running their batch..sigh.) or when a filesystem is at 100%. The reporting is for instance on CPU and filesystem usage. But indeed it can't provide insight in the performance of a guest, other than detect thresholds. And it doesn't have to either, the monitoring department can look at top, vmstat or sar to detect that kind of problems should they need to (yeah right, then they know all about the entire environment). Still, as for a case, this is a good point. We need to be able to address performance related monitoring and nagios can't do that. Or at least not within the scope of an entire LPAR. Thanks, Berry. -- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2011 - April 15-19, 2011 Colorado Springs, CO -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/