Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
Is it monitoring or metering ? Ceilometer does metering. Endre. 2012/4/9 Huang Zhiteng winsto...@gmail.com Thanks. Now I understand the performance metrics you guys were talking about. It'd be good if we can have some tool reporting numbers for a cloud just like 'mpstat', 'iostat' did for a system. On Mon, Apr 9, 2012 at 3:06 PM, Tim Bell tim.b...@cern.ch wrote: Availability metrics for me are ones that allow me to tell if the service is up, degraded or down. Each of us as we start production monitoring need to work out how many nova, glance and swift processes of which type should be running. Furthermore, we need to add basic ‘ping’ style probes to see that the services are responding as expected. ** ** Performance metrics are for cases where we want to record how well the system is running. Examples of number of REST calls/second, VMs created/second etc. These are the kind of metrics which feed into capacity planning, bottleneck identification, trending. ** ** Building up an open, standard and consistent set will avoid duplicate effort as sites deploy to production and allow us to keep the monitoring up to date when the internals of OpenStack change. ** ** Tim ** ** *From:* Huang Zhiteng [mailto:winsto...@gmail.com] *Sent:* 09 April 2012 05:42 *To:* Tim Bell *Cc:* David Kranz; Andrew Clay Shafer; openstack-operat...@lists.openstack.org; Duncan McGreggor; openstack *Subject:* Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild ** ** Hi Tim, Could you elaborate more on 'performance metrics'? Like what kind of metrics are considered as performance ones? Thanks. On Sat, Apr 7, 2012 at 2:13 AM, Tim Bell tim.b...@cern.ch wrote: Splitting monitoring into 1. Gathering of metrics (availability, performance) and reporting in a standard fashion should be part of OpenStack. 2. Best practice sensors should sample the metrics and provide alarms for issues which could cause service impacts. Posting of these alarms to a monitoring system should be based on plug ins 3. Reference implementations for standard monitoring systems such as Nagios should be available that queries the data above and feeds it into the package selected Each site does not want to be involved in defining the best practice. Equally, each monitoring system should not have to have an intimate understanding of OpenStack to produce a red/green light. The components for 1 and 2 fall under the associated openstack component. Component 3 is the monitoring solution provider. Tim *From:* openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto: openstack-bounces+tim.bell=cern...@lists.launchpad.net] *On Behalf Of *David Kranz *Sent:* 06 April 2012 16:44 *To:* Andrew Clay Shafer *Cc:* openstack-operat...@lists.openstack.org; openstack; Duncan McGreggor *Subject:* Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild This is a really great list! With regard to cluster health and monitoring, I did a bunch of stuff with Swift before turning to nova and really appreciated the way each swift service has a healthcheck call that can be used by a monitoring system. While I don't think providing a production-ready monitoring system should be part of core OpenStack, it is the core architects who really know what needs to be checked to ensure that a system is healthy. There are various sets of poking at ports, process lists and so on that Crowbar, Zenoss, etc. set up but it would be a big improvement for deployers if each openstack service provided healthcheck apis based on expert knowledge of what is supposed to be happening inside. That would also insulate deployers from changes in the code that might impact what it means to be running properly. Looking forward to the discussion. -David On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote: Interested in devops. Off the top of my head. live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.com wrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
Availability metrics for me are ones that allow me to tell if the service is up, degraded or down. Each of us as we start production monitoring need to work out how many nova, glance and swift processes of which type should be running. Furthermore, we need to add basic 'ping' style probes to see that the services are responding as expected. Performance metrics are for cases where we want to record how well the system is running. Examples of number of REST calls/second, VMs created/second etc. These are the kind of metrics which feed into capacity planning, bottleneck identification, trending. Building up an open, standard and consistent set will avoid duplicate effort as sites deploy to production and allow us to keep the monitoring up to date when the internals of OpenStack change. Tim From: Huang Zhiteng [mailto:winsto...@gmail.com] Sent: 09 April 2012 05:42 To: Tim Bell Cc: David Kranz; Andrew Clay Shafer; openstack-operat...@lists.openstack.org; Duncan McGreggor; openstack Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild Hi Tim, Could you elaborate more on 'performance metrics'? Like what kind of metrics are considered as performance ones? Thanks. On Sat, Apr 7, 2012 at 2:13 AM, Tim Bell tim.b...@cern.ch wrote: Splitting monitoring into 1. Gathering of metrics (availability, performance) and reporting in a standard fashion should be part of OpenStack. 2. Best practice sensors should sample the metrics and provide alarms for issues which could cause service impacts. Posting of these alarms to a monitoring system should be based on plug ins 3. Reference implementations for standard monitoring systems such as Nagios should be available that queries the data above and feeds it into the package selected Each site does not want to be involved in defining the best practice. Equally, each monitoring system should not have to have an intimate understanding of OpenStack to produce a red/green light. The components for 1 and 2 fall under the associated openstack component. Component 3 is the monitoring solution provider. Tim From: openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto:openstack-bounces+tim.bell mailto:openstack-bounces%2Btim.bell =cern...@lists.launchpad.net] On Behalf Of David Kranz Sent: 06 April 2012 16:44 To: Andrew Clay Shafer Cc: openstack-operat...@lists.openstack.org; openstack; Duncan McGreggor Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild This is a really great list! With regard to cluster health and monitoring, I did a bunch of stuff with Swift before turning to nova and really appreciated the way each swift service has a healthcheck call that can be used by a monitoring system. While I don't think providing a production-ready monitoring system should be part of core OpenStack, it is the core architects who really know what needs to be checked to ensure that a system is healthy. There are various sets of poking at ports, process lists and so on that Crowbar, Zenoss, etc. set up but it would be a big improvement for deployers if each openstack service provided healthcheck apis based on expert knowledge of what is supposed to be happening inside. That would also insulate deployers from changes in the code that might impact what it means to be running properly. Looking forward to the discussion. -David On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote: Interested in devops. Off the top of my head. live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.com wrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
Thanks. Now I understand the performance metrics you guys were talking about. It'd be good if we can have some tool reporting numbers for a cloud just like 'mpstat', 'iostat' did for a system. On Mon, Apr 9, 2012 at 3:06 PM, Tim Bell tim.b...@cern.ch wrote: Availability metrics for me are ones that allow me to tell if the service is up, degraded or down. Each of us as we start production monitoring need to work out how many nova, glance and swift processes of which type should be running. Furthermore, we need to add basic ‘ping’ style probes to see that the services are responding as expected. ** ** Performance metrics are for cases where we want to record how well the system is running. Examples of number of REST calls/second, VMs created/second etc. These are the kind of metrics which feed into capacity planning, bottleneck identification, trending. ** ** Building up an open, standard and consistent set will avoid duplicate effort as sites deploy to production and allow us to keep the monitoring up to date when the internals of OpenStack change. ** ** Tim ** ** *From:* Huang Zhiteng [mailto:winsto...@gmail.com] *Sent:* 09 April 2012 05:42 *To:* Tim Bell *Cc:* David Kranz; Andrew Clay Shafer; openstack-operat...@lists.openstack.org; Duncan McGreggor; openstack *Subject:* Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild ** ** Hi Tim, Could you elaborate more on 'performance metrics'? Like what kind of metrics are considered as performance ones? Thanks. On Sat, Apr 7, 2012 at 2:13 AM, Tim Bell tim.b...@cern.ch wrote: Splitting monitoring into 1. Gathering of metrics (availability, performance) and reporting in a standard fashion should be part of OpenStack. 2. Best practice sensors should sample the metrics and provide alarms for issues which could cause service impacts. Posting of these alarms to a monitoring system should be based on plug ins 3. Reference implementations for standard monitoring systems such as Nagios should be available that queries the data above and feeds it into the package selected Each site does not want to be involved in defining the best practice. Equally, each monitoring system should not have to have an intimate understanding of OpenStack to produce a red/green light. The components for 1 and 2 fall under the associated openstack component. Component 3 is the monitoring solution provider. Tim *From:* openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto: openstack-bounces+tim.bell=cern...@lists.launchpad.net] *On Behalf Of *David Kranz *Sent:* 06 April 2012 16:44 *To:* Andrew Clay Shafer *Cc:* openstack-operat...@lists.openstack.org; openstack; Duncan McGreggor *Subject:* Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild This is a really great list! With regard to cluster health and monitoring, I did a bunch of stuff with Swift before turning to nova and really appreciated the way each swift service has a healthcheck call that can be used by a monitoring system. While I don't think providing a production-ready monitoring system should be part of core OpenStack, it is the core architects who really know what needs to be checked to ensure that a system is healthy. There are various sets of poking at ports, process lists and so on that Crowbar, Zenoss, etc. set up but it would be a big improvement for deployers if each openstack service provided healthcheck apis based on expert knowledge of what is supposed to be happening inside. That would also insulate deployers from changes in the code that might impact what it means to be running properly. Looking forward to the discussion. -David On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote: Interested in devops. Off the top of my head. live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.com wrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
Hi Tim, Could you elaborate more on 'performance metrics'? Like what kind of metrics are considered as performance ones? Thanks. On Sat, Apr 7, 2012 at 2:13 AM, Tim Bell tim.b...@cern.ch wrote: ** ** Splitting monitoring into ** ** **1. **Gathering of metrics (availability, performance) and reporting in a standard fashion should be part of OpenStack. **2. **Best practice sensors should sample the metrics and provide alarms for issues which could cause service impacts. Posting of these alarms to a monitoring system should be based on plug ins **3. **Reference implementations for standard monitoring systems such as Nagios should be available that queries the data above and feeds it into the package selected ** ** Each site does not want to be involved in defining the best practice. Equally, each monitoring system should not have to have an intimate understanding of OpenStack to produce a red/green light. The components for 1 and 2 fall under the associated openstack component. Component 3 is the monitoring solution provider. ** ** Tim ** ** *From:* openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto: openstack-bounces+tim.bell=cern...@lists.launchpad.net] *On Behalf Of *David Kranz *Sent:* 06 April 2012 16:44 *To:* Andrew Clay Shafer *Cc:* openstack-operat...@lists.openstack.org; openstack; Duncan McGreggor *Subject:* Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild ** ** This is a really great list! With regard to cluster health and monitoring, I did a bunch of stuff with Swift before turning to nova and really appreciated the way each swift service has a healthcheck call that can be used by a monitoring system. While I don't think providing a production-ready monitoring system should be part of core OpenStack, it is the core architects who really know what needs to be checked to ensure that a system is healthy. There are various sets of poking at ports, process lists and so on that Crowbar, Zenoss, etc. set up but it would be a big improvement for deployers if each openstack service provided healthcheck apis based on expert knowledge of what is supposed to be happening inside. That would also insulate deployers from changes in the code that might impact what it means to be running properly. Looking forward to the discussion. -David On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote: Interested in devops. ** ** Off the top of my head. ** ** live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration ** ** ** ** On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.com wrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ** ** ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp -- Regards Huang Zhiteng ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
This is a really great list! With regard to cluster health and monitoring, I did a bunch of stuff with Swift before turning to nova and really appreciated the way each swift service has a healthcheck call that can be used by a monitoring system. While I don't think providing a production-ready monitoring system should be part of core OpenStack, it is the core architects who really know what needs to be checked to ensure that a system is healthy. There are various sets of poking at ports, process lists and so on that Crowbar, Zenoss, etc. set up but it would be a big improvement for deployers if each openstack service provided healthcheck apis based on expert knowledge of what is supposed to be happening inside. That would also insulate deployers from changes in the code that might impact what it means to be running properly. Looking forward to the discussion. -David On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote: Interested in devops. Off the top of my head. live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.com mailto:dun...@dreamhost.com wrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack Post to : openstack@lists.launchpad.net mailto:openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
Splitting monitoring into 1. Gathering of metrics (availability, performance) and reporting in a standard fashion should be part of OpenStack. 2. Best practice sensors should sample the metrics and provide alarms for issues which could cause service impacts. Posting of these alarms to a monitoring system should be based on plug ins 3. Reference implementations for standard monitoring systems such as Nagios should be available that queries the data above and feeds it into the package selected Each site does not want to be involved in defining the best practice. Equally, each monitoring system should not have to have an intimate understanding of OpenStack to produce a red/green light. The components for 1 and 2 fall under the associated openstack component. Component 3 is the monitoring solution provider. Tim From: openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net] On Behalf Of David Kranz Sent: 06 April 2012 16:44 To: Andrew Clay Shafer Cc: openstack-operat...@lists.openstack.org; openstack; Duncan McGreggor Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild This is a really great list! With regard to cluster health and monitoring, I did a bunch of stuff with Swift before turning to nova and really appreciated the way each swift service has a healthcheck call that can be used by a monitoring system. While I don't think providing a production-ready monitoring system should be part of core OpenStack, it is the core architects who really know what needs to be checked to ensure that a system is healthy. There are various sets of poking at ports, process lists and so on that Crowbar, Zenoss, etc. set up but it would be a big improvement for deployers if each openstack service provided healthcheck apis based on expert knowledge of what is supposed to be happening inside. That would also insulate deployers from changes in the code that might impact what it means to be running properly. Looking forward to the discussion. -David On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote: Interested in devops. Off the top of my head. live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.com wrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack https://launchpad.net/%7Eopenstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
Interested in devops. Off the top of my head. live upgrades api queryable indications of cluster health api queryable cluster version and configuration info enabling monitoring as a first class concern in OpenStack (either as a cross cutting concern, or as it's own project) a framework for gathering and sharing performance benchmarks with architecture and configuration On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor dun...@dreamhost.comwrote: For anyone interested in DevOps, Ops, cloud hosting management, etc., there's a proposed session we could use your feedback on for topics of discussion: http://summit.openstack.org/sessions/view/57 Respond with your thoughts and ideas, and I'll be sure to add them to the list. Thanks! d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp