On 05/13/2015 09:51 AM, Simon Pasquier wrote:
On Wed, May 13, 2015 at 3:27 PM, David Kranz <dkr...@redhat.com
<mailto:dkr...@redhat.com>> wrote:
On 05/13/2015 09:06 AM, Simon Pasquier wrote:
Hello,
Like many others commented before, I don't quite understand how
unique are the Cloudpulse use cases.
For operators, I got the feeling that existing solutions fit well:
- Traditional monitoring tools (Nagios, Zabbix, ....) are
necessary anyway for infrastructure monitoring (CPU, RAM, disks,
operating system, RabbitMQ, databases and more) and diagnostic
purposes. Adding OpenStack service checks is fairly easy if you
already have the toolchain.
Is it really so easy? Rabbitmq has an "aliveness" test that is
easy to hook into. I don't know exactly what it does, other than
what the doc says, but I should not have to. If I want my standard
monitoring system to call into a cloud and ask "is nova healthy?",
"is glance healthy?", etc. are their such calls?
Regarding RabbitMQ aliveness test, it has its own limits (more on that
latter, I've got an "interesting" RabbitMQ outage that I'm going to
discuss in a new thread) and it doesn't replicate exactly what the
clients (eg OpenStack services) are doing.
I'm sure it has limits but my point was that the developers of rabbitmq
understood that it would be difficult for users to know exactly what
should be poked at inside to check health, so they provide a call to do it.
Regarding the service checks, there are already plenty of scripts that
exist for Nagios, Collectd and so on. Some of them are listed in the
Wiki [1].
I understand and that is what I meant by "after-market". If some one
puts a new feature in service X, that requires some monitoring to be
healthy, then all those different scripts need to chase after it to keep
up to date. Poking at service internals to check the health of a service
is an abstraction violation. As some one on this thread said,
tempest/rally can be used to check a certain kind of health but it is
akin to black-box testing whereas health monitoring should be more akin
to whitebox-testing.
There are various sets of calls associated with nagios, zabbix,
etc. but those seem like "after-market" parts for a car. Seems to
me the services themselves would know best how to check if they
are healthy, particularly as that could change version to version.
Has their been discussion of adding a health-check (admin) api in
each service? Lacking that, is there documentation from any
OpenStack projects about "how to check the health of nova"? When I
saw this thread start, that is what I thought it was going to be
about.
Starting with Kilo, you could configure your OpenStack API services
with the healthcheck middleware [2]. This has been inspired by what
Swift's been doing for some time now [3].IIUC the default healthcheck
is minimalist and doesn't check that dependent services (like
RabbitMQ, database) are healthy but the framework is extensible and
more healthchecks can be added.
I can see that but the real value would be in abstracting the details of
what it means for a service to be healthy inside the implementation and
exporting an api. If that were present, the question of whether calling
it used middleware or not would be secondary. I'm not sure what the
value-add of middleware would be in this case.
-David
-David
BR,
Simon
[1]
https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending
[2]
http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck
[3]
http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html
- OpenStack projects like Rally or Tempest can generate synthetic
loads and run end-to-end tests. Integrating them with a
monitoring system isn't terribly difficult either.
As far as Monitoring-as-a-service is concerned, do you have plans
to integrate/leverage Ceilometer?
BR,
Simon
On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari)
<vpand...@cisco.com <mailto:vpand...@cisco.com>> wrote:
Hello,
I'm pleased to announce the development of a new project
called CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and
applications. This project will begin as
a StackForge project based upon an empty cookiecutter[1]
repo. The repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient
Please join us via iRC on #openstack-cloudpulse on freenode.
I am holding a doodle poll to select times for our first
meeting the week after summit. This doodle poll will close
May 24th and meeting times will be announced on the mailing
list at that time. At our first IRC meeting,
we will draft additional core team members, so if your
interested in joining a fresh new development effort, please
attend our first meeting.
Please take a moment if your interested in CloudPulse to fill
out the doodle poll here:
https://doodle.com/kcpvzy8kfrxe6rvb
The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven
DakeandVinod Pandarinathan.
I expect more members to join during our initial meeting.
A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures
before a customer reports the failure. Cloud operators can
then take timely corrective actions with minimal disruption
to applications. Many cloud applications, including
those I am interested in (NFV) have very stringent service
level agreements. Loss of service can trigger contractual
costs associated with the service. Application high
availability requires an operational OpenStack Cloud, and the
reality
is that occascionally OpenStack clouds fail in some
mysterious ways. This project intends to identify when those
failures
occur so corrective actions may be taken by operators,
tenants, and the applications themselves.
OpenStack is considered healthy when OpenStack API services
respond appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant
networks and can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.
For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient
For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse
Plase join the CloudPulse team in designing and implementing
a world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing
you on IRC on #openstack-cloudpulse.
Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<mailto:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev