Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-03-10 Thread gordon chung
hi,
just to follow-up, thanks for the input, the usability of ceilometer is 
obviously a concern of ours and something the team tries to address with the 
resources we have.
as a quick help/update, here are some points of interests that i think might 
help:- if using Juno+, DO use the notifier:// publisher rather than rpc:// as 
there is a certain level of overhead that comes with rpc[1]. you can also 
configure multiple messaging servers if there are load issues.- a part of the 
telemetry team has been exploring tsdb and we expect to have a tech preview for 
Kilo. the project is called Gnocchi[2]- in Kilo, we expanded notification event 
handling (existing stacktach integration code) and said events can be published 
to an external source(s) or to a database (ElasticSearch for full-text 
querying, in addition to mongo, sql)- ceilometer does not configure databases. 
operators are expected to read up on the db of choice and properly configure db 
to their needs (ie. don't run default mongo install on a single node with no 
sharding to store data from 2000 nodes)[3]- DO adjust your pipeline to only 
store events/meters that you use. by default, ceilometer gives you the world 
and from there you can filter based on requirements.- it's entirely possible to 
use ceilometer to gather data and store it externally and avoid ceilometer 
storage (if you so choose)- DO NOT use SQL backend prior to Juno... for any 
deployment size... any...- there was some work in Kilo to jitter polling cycle 
of agents to distribute load.- the agents are designed to scale horizontally to 
increase bandwidth. also, they work independently so if you want just 
notifications, it's possible to just deploy the notification agent and nothing 
else.
we've also been updating -- and still continuing to update -- some of the docs 
to better reflect some of the changes made to Ceilometer in Juno and 
Kilo[4][5]. particularly, i'd probably look at the architecture diagram[6] to 
get an idea of what components of ceilometer you could use to fit your needs.
i'm probably missed stuff but i hope the above helps. as always, community help 
is always invited. if you have a patch that will improve ceilometer, the 
community gladly welcomes it.
[1] https://www.rabbitmq.com/tutorials/tutorial-six-python.html[2] 
http://www.slideshare.net/EoghanGlynn/rdo-hangout-on-gnocchi[3] 
http://blog.sileht.net/using-a-shardingreplicaset-mongodb-with-ceilometer[4] 
http://docs.openstack.org/admin-guide-cloud/content/ch_admin-openstack-telemetry.html[5]
 http://docs.openstack.org/developer/ceilometer/[6] 
http://docs.openstack.org/developer/ceilometer/architecture.html (self-plug for 
my amazing diagram skills)
cheers,gord
  ___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-16 Thread Sanjay Mishra
We've been working with Ceilometer since Grizzly and our experience has been 
positive, albeit with a couple of caveats.  Our product, OpenBook, depends upon 
Ceilometer for the metering data to drive rating and billing.  As with most 
things in OpenStack today, we've taken advantage of what Ceilometer does well 
and been able to work around those things that could use some improvement.  
What Ceilometer does very well is be a funnel for provisioning, state change 
and consumption data from OpenStack.  We would not have been able to make the 
progress we did with our solution if Ceilometer didn't exist as a data provider 
within OpenStack.  What it doesn't do very well today is be a scalable 
long-term repository of raw and aggregated data.  In our case, we already had a 
model where we collected data from multiple sources, and normalized and 
aggregated this data before storing it locally, so this shortcoming didn't 
present a problem for us.  So, YMMV depending on your use case, but for us and
  our customers, Ceilometer gets the job done.

--Sanjay

-Original Message-
From: Allamaraju, Subbu [mailto:su...@subbu.org] 
Sent: Monday, February 16, 2015 4:47 PM
To: Chris Dent
Cc: openstack-operators
Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with 
Ceilometer deployments - Feedback requested

You bring up a very good point. What are the parts that are uniquely useful? 
Here are two key reasons why we chose to bite the bullet and operationalize 
Ceilometer at work:

1. Stable set of APIs for tenants to access and publish metrics and alarms 2. 
An unobtrusive way for providers to collect metrics of tenant resources

There are a ton of tools out there to gather metrics, to store, to process, to 
raise alarms etc. However, without stable set of interfaces, it is tough to 
build large ecosystems.

My 2 cents.

Subbu

> On Feb 16, 2015, at 4:40 AM, Chris Dent  wrote:
> 
> On Thu, 12 Feb 2015, Clint Byrum wrote:
> 
>> I wonder how hard it would be to push Ceilometer down the road of 
>> being an OpenStack shim for collectd instead of a full 
>> implementation. This would make the problem above go away, as 
>> collectd is written in C and is well known to be highly optimized for 
>> exactly this type of workload.
> 
> I think this otherwise interesting idea jumps the gun a bit in a few 
> different ways.
> 
> * We first need to identify what people are actually hoping to do with  
> Ceilometer or something like it. In this thread alone we've got talk  
> of metering, billing, rating, monitoring, alarming/auto-scaling  
> without any of those terms being very well defined. It's obvious  that 
> any one service is not going to be able to do all of those things  
> well but it is not obvious how, without defining the terms, we  can 
> figure out how to do some small number of them well.
> 
> * We also need to identify the parts of Ceilometer that are uniquely  
> useful. That is what parts of it are not otherwise covered by  
> existing tools that have an associated healthy opensource ecosystem.
>  I'm not really sure the answer to this but to toss out some ideas:
>  The things that Ceilometer has that make it special are the polling  
> and notification agents and the associated pipeline. These are the  
> parts that gather and transform events and meters that are unique to  
> the openstack environment.
> 
>  (Curiously the gatherers are also the parts of Ceilometer that I 
> think  should be in the repos of other projects as plugins which 
> generate  notifications but that's a different topic.)
> 
> Gnocchi is very interesting because it follows what has become a time 
> honored style in OpenStack: Create an abstraction layer over a 
> relatively small area of purpose and provide an easy to replace 
> default driver. It also does this in a context that is independent 
> from Ceilometer.
> 
> This could get us a few things:
> 
> * An improvable storage and reporting backend for Ceilometer that can  
> evolve separately from Ceilometer.
> * A way to shrink Ceilometer itself so that it can become more  
> narrowly focused on whatever its core purpose is defined to be.
>  This is not that complex of a task: Ceilometer is already structured  
> such that it is quite straightforward to send the results of the  
> pipeline wherever we like. Gnocchi is one such destination.
> 
> But again: We first need to figure out what people actually want to do 
> and care about.
> --
> Chris Dent tw:@anticdent freenode:cdent 
> https://tank.peermore.com/tanks/cdent
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cg

Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-16 Thread Allamaraju, Subbu
You bring up a very good point. What are the parts that are uniquely useful? 
Here are two key reasons why we chose to bite the bullet and operationalize 
Ceilometer at work:

1. Stable set of APIs for tenants to access and publish metrics and alarms 
2. An unobtrusive way for providers to collect metrics of tenant resources

There are a ton of tools out there to gather metrics, to store, to process, to 
raise alarms etc. However, without stable set of interfaces, it is tough to 
build large ecosystems.

My 2 cents.

Subbu

> On Feb 16, 2015, at 4:40 AM, Chris Dent  wrote:
> 
> On Thu, 12 Feb 2015, Clint Byrum wrote:
> 
>> I wonder how hard it would be to push Ceilometer down the road of being
>> an OpenStack shim for collectd instead of a full implementation. This
>> would make the problem above go away, as collectd is written in C and is
>> well known to be highly optimized for exactly this type of workload.
> 
> I think this otherwise interesting idea jumps the gun a bit in a few
> different ways.
> 
> * We first need to identify what people are actually hoping to do with
>  Ceilometer or something like it. In this thread alone we've got talk
>  of metering, billing, rating, monitoring, alarming/auto-scaling
>  without any of those terms being very well defined. It's obvious
>  that any one service is not going to be able to do all of those things
>  well but it is not obvious how, without defining the terms, we
>  can figure out how to do some small number of them well.
> 
> * We also need to identify the parts of Ceilometer that are uniquely
>  useful. That is what parts of it are not otherwise covered by
>  existing tools that have an associated healthy opensource ecosystem.
>  I'm not really sure the answer to this but to toss out some ideas:
>  The things that Ceilometer has that make it special are the polling
>  and notification agents and the associated pipeline. These are the
>  parts that gather and transform events and meters that are unique to
>  the openstack environment.
> 
>  (Curiously the gatherers are also the parts of Ceilometer that I think
>  should be in the repos of other projects as plugins which generate
>  notifications but that's a different topic.)
> 
> Gnocchi is very interesting because it follows what has become a time
> honored style in OpenStack: Create an abstraction layer over a
> relatively small area of purpose and provide an easy to replace default
> driver. It also does this in a context that is independent from
> Ceilometer.
> 
> This could get us a few things:
> 
> * An improvable storage and reporting backend for Ceilometer that can
>  evolve separately from Ceilometer.
> * A way to shrink Ceilometer itself so that it can become more
>  narrowly focused on whatever its core purpose is defined to be.
>  This is not that complex of a task: Ceilometer is already structured
>  such that it is quite straightforward to send the results of the
>  pipeline wherever we like. Gnocchi is one such destination.
> 
> But again: We first need to figure out what people actually want to do
> and care about.
> -- 
> Chris Dent tw:@anticdent freenode:cdent
> https://tank.peermore.com/tanks/cdent
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-16 Thread Chris Dent

On Thu, 12 Feb 2015, Clint Byrum wrote:


I wonder how hard it would be to push Ceilometer down the road of being
an OpenStack shim for collectd instead of a full implementation. This
would make the problem above go away, as collectd is written in C and is
well known to be highly optimized for exactly this type of workload.


I think this otherwise interesting idea jumps the gun a bit in a few
different ways.

* We first need to identify what people are actually hoping to do with
  Ceilometer or something like it. In this thread alone we've got talk
  of metering, billing, rating, monitoring, alarming/auto-scaling
  without any of those terms being very well defined. It's obvious
  that any one service is not going to be able to do all of those things
  well but it is not obvious how, without defining the terms, we
  can figure out how to do some small number of them well.

* We also need to identify the parts of Ceilometer that are uniquely
  useful. That is what parts of it are not otherwise covered by
  existing tools that have an associated healthy opensource ecosystem.
  I'm not really sure the answer to this but to toss out some ideas:
  The things that Ceilometer has that make it special are the polling
  and notification agents and the associated pipeline. These are the
  parts that gather and transform events and meters that are unique to
  the openstack environment.

  (Curiously the gatherers are also the parts of Ceilometer that I think
  should be in the repos of other projects as plugins which generate
  notifications but that's a different topic.)

Gnocchi is very interesting because it follows what has become a time
honored style in OpenStack: Create an abstraction layer over a
relatively small area of purpose and provide an easy to replace default
driver. It also does this in a context that is independent from
Ceilometer.

This could get us a few things:

* An improvable storage and reporting backend for Ceilometer that can
  evolve separately from Ceilometer.
* A way to shrink Ceilometer itself so that it can become more
  narrowly focused on whatever its core purpose is defined to be.
  This is not that complex of a task: Ceilometer is already structured
  such that it is quite straightforward to send the results of the
  pipeline wherever we like. Gnocchi is one such destination.

But again: We first need to figure out what people actually want to do
and care about.
--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-16 Thread Chris Dent

On Thu, 12 Feb 2015, George Shuklin wrote:

1. Collector leaks memory. We ran it on same host with mongo, and it grab 
29Gb out of 32, leaving mongo with less than gig memory available.


Is this icehouse, juno or kilo? I ask because a) things have changed a
lot in the past several months (and continue to change) and b) this
might be fixable and is the sort of thing I like to fix.

--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Andy Hill
> We use StackTach only as a troubleshooting tool. If a user is having an
> issue, we'll bring up their event history and review their timeline. I think
> this alone makes it an invaluable tool.

+1 for StackTach as an invaluable troubleshooting tool.

It's probably worth calling out stacky CLI tool[1] for pulling data
from StackTach.

[1] https://github.com/rackerlabs/stacky

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Sandy Walsh
Yagi [1] is a really easy way to consume notifications and do stuff with them. 
We use it as the basis for our STv3 consumption. 

Very easy to write a Handler (fill in the handle_events() method) and you're 
off to the races. 

Yagi has a proper worker than can be daemonized and flexible logging. Spawn 
many for larger loads. 

Even easier, but not as battle tested, is notabene [2] ... here's an example of 
what it takes to consume notifications [3]

[1] https://github.com/rackerlabs/yagi
[2] https://github.com/StackTach/notabene
[3] 
https://github.com/stackforge/stacktach-notigen/blob/master/bin/event_consumer.py




From: Clint Byrum [cl...@fewbar.com]
Sent: Thursday, February 12, 2015 2:28 PM
To: openstack-operators
Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with  
Ceilometer deployments - Feedback requested

Excerpts from George Shuklin's message of 2015-02-11 17:59:02 -0800:
> Ceilometer is in sad state.
>
> 1. Collector leaks memory. We ran it on same host with mongo, and it
> grab 29Gb out of 32, leaving mongo with less than gig memory available.

I wonder how hard it would be to push Ceilometer down the road of being
an OpenStack shim for collectd instead of a full implementation. This
would make the problem above go away, as collectd is written in C and is
well known to be highly optimized for exactly this type of workload.

You would need a more advanced AMQP plugin that understands how to turn
the notifications in OpenStack into collectd values, and then make some
decisions on whether to keep Ceilometer's SQL/MongoDB backend or just
teach Ceilometer to read from the various collectd output formats. I
think the latter will be a bigger win, but the former would be easier
for a more incremental migration.

Anyway, if people are interested in "saving" Ceilometer from being a
bit sluggish, that seems like a good first step in the investigation.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Clint Byrum
Excerpts from George Shuklin's message of 2015-02-11 17:59:02 -0800:
> Ceilometer is in sad state.
> 
> 1. Collector leaks memory. We ran it on same host with mongo, and it 
> grab 29Gb out of 32, leaving mongo with less than gig memory available.

I wonder how hard it would be to push Ceilometer down the road of being
an OpenStack shim for collectd instead of a full implementation. This
would make the problem above go away, as collectd is written in C and is
well known to be highly optimized for exactly this type of workload.

You would need a more advanced AMQP plugin that understands how to turn
the notifications in OpenStack into collectd values, and then make some
decisions on whether to keep Ceilometer's SQL/MongoDB backend or just
teach Ceilometer to read from the various collectd output formats. I
think the latter will be a bigger win, but the former would be easier
for a more incremental migration.

Anyway, if people are interested in "saving" Ceilometer from being a
bit sluggish, that seems like a good first step in the investigation.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Kris G. Lindgren
We have also dropped ceilometer.

Between the load that it generated on the API's.  Was causing nova queries to 
take over 30 seconds and time outs - because they were all backing up on 
neutron trying to get vm netowrk info. Also, the fact that a simple meter list 
resulted in 100% cpu usage of both ceilometer and the ceilometer-client and 
took over a minute to return any data.  In the end, it was simply unusable for 
people to integrate with.

Though we are considering re-enabling it for the notifications -> kafka portion 
only.  I think like others before, we ended up gathering the meteric-stuff 
outside of openstack, we used diamond -> graphite.  Though in the future we 
might switch from graphite to something else (opentsdb?).


Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.


From: Diego Parrilla Santamaría 
mailto:diego.parrilla.santama...@gmail.com>>
Date: Thursday, February 12, 2015 at 10:23 AM
To: "maishsk+openst...@maishsk.com<mailto:maishsk+openst...@maishsk.com>" 
mailto:maishsk+openst...@maishsk.com>>
Cc: OpenStack Development Mailing List 
mailto:openstack-...@lists.openstack.org>>, 
"openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>"
 
mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with 
Ceilometer deployments - Feedback requested

Hi Mash,

we dropped Ceilometer as the core tool to gather metrics for our rating and 
billing system. I must admit it has improved, but I think it's broken by 
design: a metering and monitoring system is not the same thing.

We have built a component that directly listens from rabbit notification tools 
(a-la-Stacktach). This tool stores the all events in a database (but anything 
could work, it's just a logging system) and then we process these events and 
store them in a datamart style database every hour. The rating and billing 
system reads this database and process it every hour too. We decided to 
implement this pipeline processing of data because we knew in advance that 
processing such an amount of data was a challenge.

I think Ceilometer should be used just to trigger alarms for heat for example, 
and something else should be used for rating and billing.

Cheers
Diego




 --
Diego Parrilla
<http://www.stackops.com/>CEO
www.stackops.com<http://www.stackops.com/> |  
diego.parri...@stackops.com<mailto:diego.parri...@stackops.com> | +34 91 
005-2164 | skype:diegoparrilla

[http://stackops.s3-external-3.amazonaws.com/STACKOPSLOGO-ICON.png]


On Wed, Feb 11, 2015 at 8:37 PM, Maish Saidel-Keesing 
mailto:mais...@maishsk.com>> wrote:
Is Ceilometer ready for prime time?

I would be interested in hearing from people who have deployed OpenStack clouds 
with Ceilometer, and their experience. Some of the topics I am looking for 
feedback on are:

- Database Size
- MongoDB management, Sharding, replica sets etc.
- Replication strategies
- Database backup/restore
- Overall useability
- Gripes, pains and problems (things to look out for)
- Possible replacements for Ceilometer that you have used instead


If you are willing to share - I am sure it will be beneficial to the whole 
community.

Thanks in Advance


With best regards,


Maish Saidel-Keesing
Platform Architect
Cisco




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Diego Parrilla Santamaría
Hi Mash,

we dropped Ceilometer as the core tool to gather metrics for our rating and
billing system. I must admit it has improved, but I think it's broken by
design: a metering and monitoring system is not the same thing.

We have built a component that directly listens from rabbit notification
tools (a-la-Stacktach). This tool stores the all events in a database (but
anything could work, it's just a logging system) and then we process these
events and store them in a datamart style database every hour. The rating
and billing system reads this database and process it every hour too. We
decided to implement this pipeline processing of data because we knew in
advance that processing such an amount of data was a challenge.

I think Ceilometer should be used just to trigger alarms for heat for
example, and something else should be used for rating and billing.

Cheers
Diego



 --
Diego Parrilla
*CEO*
*www.stackops.com  | * diego.parri...@stackops.com |
+34 91 005-2164 | skype:diegoparrilla



On Wed, Feb 11, 2015 at 8:37 PM, Maish Saidel-Keesing 
wrote:

> Is Ceilometer ready for prime time?
>
> I would be interested in hearing from people who have deployed OpenStack
> clouds with Ceilometer, and their experience. Some of the topics I am
> looking for feedback on are:
>
> - Database Size
> - MongoDB management, Sharding, replica sets etc.
> - Replication strategies
> - Database backup/restore
> - Overall useability
> - Gripes, pains and problems (things to look out for)
> - Possible replacements for Ceilometer that you have used instead
>
>
> If you are willing to share - I am sure it will be beneficial to the whole
> community.
>
> Thanks in Advance
>
>
> With best regards,
>
>
> Maish Saidel-Keesing
> Platform Architect
> Cisco
>
>
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Joe Topjian
Hi Tim,

Does anyone have any proposals regarding
>
> > - Possible replacements for Ceilometer that you have used instead
>
> It seems that many sites have written their own systems.
>

Sorry - I should have appended this at the end of my last post.

I need to preface this with "I have never used Ceilometer nor do our
environments require billing". But we're already collecting a lot of
information that could be used for billing.

The `nova usage-list` command reports a tenant's compute resource
allocation per 24 hour period.

For per-instance metrics, I've posted a script that will collect them here:

https://github.com/osops/tools-generic/blob/master/libvirt/instance_metrics.rb

I recently discovered that the `nova diagnostics` command reports almost
the same information, minus the CPU usage that I'm polling via `ps`. This
might not be needed for most environments, though, and so `nova
diagnostics` alone should be fine.

So between all of this information, we're able to create a good picture of
a tenant's compute usage. Of course, if we were to do billing, this would
all need fed into a billing system of some sort. Plus, the 24 hour
resolution might be too large.

But hopefully it gives a good indication that polling some basic metrics of
compute usage doesn't require a lot of resources. :)
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Joe Topjian
Hi Sandy,

That said, I'd love to hear about headaches and failures of the older
> StackTach release and how people are using it, or hope to use it.
>

We have two StackTach v2 environments, one of which has been running for
almost 3 years. For that particular environment, it can be a bear to do
queries, sometimes taking up to a few minutes. This is understandable with
how the information is stored in the db.

Another issue we've seen is that the workers sometimes fail to reconnect to
Rabbit after a WAN outage. The remedy for that is to restart the workers
from a cron.

But other than that, it runs great. Our environments are definitely not at
the same scale as, say, eBay or CERN, and so operating StackTach has been
manageable.

We use StackTach only as a troubleshooting tool. If a user is having an
issue, we'll bring up their event history and review their timeline. I
think this alone makes it an invaluable tool.

I reviewed all of your StackTach v3 stuff the other week. At first glance,
there's definitely a lot more moving parts than with v2, but after reading
about each one, they all make sense. I'm looking forward to trying some of
it out.

I'd be happy to talk more in Philadelphia if you'd like. :)

Thanks,
Joe
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Sandy Walsh
btw> if you want to know how StackTach handles billing, here's the salient part 
from our Hong Kong presentation [1]. Back when we were attempting Ceilometer 
integration.

[1] http://youtu.be/c8zZtSL0t00?t=8m26s
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread matt
so my understanding is that the billing part of ceilometer has to do with
financial requirements in how and what is reported in metrics.  ie... you
cannot bill for rounded values or heuristics.  but stacktach has no such
specific need to solve for financial needs and really is intended to help
an operator isolate or identify issues / potential issues.

that's a pretty big difference in use case.

-mayy

On Thu, Feb 12, 2015 at 11:30 AM, Kris G. Lindgren 
wrote:

> Event-based Monitoring & Billing solution for OpenStack
>
> Unsure what its checking out for billing though.
> 
>
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy, LLC.
>
>
>
> On 2/12/15, 9:17 AM, "Matt Joyce"  wrote:
>
> >I thought stacktach was more in the vein of diagnostic.  Not billable
> >resources.
> >
> >On Feb 12, 2015 10:47 AM, Tim Bell  wrote:
> >>
> >> Does anyone have any proposals regarding
> >>
> >> > - Possible replacements for Ceilometer that you have used instead
> >>
> >> It seems that many sites have written their own systems. The
> >>stacktach/monasca teams are due to demo to the operators meetup in
> >>Philadelphia  in March.
> >>
> >> Does anyone have experience to share comparing ceilometer with
> >>stacktach ?
> >>
> >> Tim
> >>
> >> > -Original Message-
> >> > From: Daniele Venzano [mailto:daniele.venz...@eurecom.fr]
> >> > Sent: 12 February 2015 12:24
> >> > To: openstack-operators@lists.openstack.org
> >> > Subject: Re: [Openstack-operators] [Ceilometer] Real world experience
> >>with
> >> > Ceilometer deployments - Feedback requested
> >> >
> >> > Unfortunately, I can only confirm the sorry state of Ceilometer.
> >> > We tried it on a very small setup (6 compute nodes) and run in so
> >>many issues,
> >> > we dropped it and created our own solution based on a mix of scripts
> >>that read
> >> > from the nova/neutron DB, iptables and collectd data. No need for
> >>more
> >> > collection agents than what we are already running for the systems
> >>monitoring.
> >> >
> >> > We tried the version in Havana and, later, in Icehouse. For starters
> >>the
> >> > documentation was suggesting MySQL as default backend. MySQL will
> >>last just a
> >> > few days and then break down under the size of the tables. We tried
> >>MongoDB,
> >> > but were still not satisfied with performance on such a small
> >>cluster.
> >> > Then there is the metering agent. It is yet another daemon, not
> >>integrated in
> >> > Neutron and there is no documentation about what it is actually
> >>measuring.
> >> > What if I have multiple routers? Ingress and Egress? From which point
> >>of view?
> >> > The same applies to Cinder, it requires and external agent (to be run
> >>via cron!).
> >> >
> >> > Some metrics were not recorded, we couldn't understand why and,
> >>again, no
> >> > documentation and no tooling to help us understand whether we were
> >>just
> >> > missing some config options somewhere in nova-compute or there was
> >>some
> >> > other problem with KVM/libvirt versions.
> >> > And even when we had some data and wanted to generate just a
> >>proof-of-
> >> > concept report with some information about tenant resource usage, we
> >>found
> >> > problems with the API. The fact that no one had bothered to write a
> >>simple
> >> > proof of concept script that uses the API to actually do something
> >>useful was
> >> > really off-putting.
> >> >
> >> > We had to dig in libvirt to understand what some of the metrics
> >>actually mean.
> >> > We found that we could read those same metrics from our (more
> >>efficient, well-
> >> > known) monitoring system.
> >> >
> >> > For some time we run just the agents and aggregated the data in an
> >> > elasticsearch instance through the UDP msgpack pipeline (more bugs,
> >>message
> >> > format is inconsistent, different agents generate different fields,
> >>in slightly
> >> > different formats).
> >> > It works. But for our needs it was just too much work. Most of the
> >>data is
> >> > already available from ot

Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Sandy Walsh

Hey Tim!

Thanks for the mention. I'm keen to hear the responses on this as
well.

I haven't been very active on the ML recently, so perhaps it's a good time
for an update (or an intro for those not familiar with StackTach [1])

StackTach started out as a diagnostics tool. It consumes notifications
from Nova and Glance and gives you tools for watching "operations" as
they flow through the system. An operation might be "create instance",
or "migrate" or "add network", etc. Pretty handy stuff. Especially if
you're in the process of standing up a new OpenStack deploy.

We quickly found we could get some other really cool information from
these notifications. Performance monitoring, auditing, billing and
usage data ... lots of cool stuff. Within Rax we have StackTach
deployed in all of our regions and use it for all these purposes.

StackTach doesn't really compare with Ceilometer or Monasca. We are
100% focused on a notification/event management and not metrics
(CPU=80%). Monasca would be a better comparison in that case.

But, StackTach is not great. It takes some real care and feeding to run at
scale. Particularly with the workers. StackTach has no provisions for
horizontal scaling. And there are no provisions for long term
archiving. We do it, but it's fragile.

So, about a year ago, we started working on StackTach version 3 (STv3)
to address these problems [2]. We're currently rolling this out within
Rax. We're still in the "driving a car with square wheels" phase, but
it's getting better. We're horizontally scalable. We have Ansible
deploy scripts. We support long term archiving to Swift, and soon to
HDFS. We're highly componentized so you can pick and choose the pieces
you want to use (as Monasca is doing, wrapping many of our libraries
to fit their model). And we should be able to support most
notification types ... not just Nova and Glance and not just
OpenStack. We're aiming to make this a broad solution.

Hopefully we'll be able to show more at the Ops meetup :)

That said, I'd love to hear about headaches and failures of the older
StackTach release and how people are using it, or hope to use it.

Cheers! 
-S

PS> I'm behind on my screencast series. Hopefully I'll get them updated once
get past pre-prod. :)

[1] https://github.com/stackforge?query=stacktach
[2] https://www.youtube.com/playlist?list=PLmyM48VxCGaW5pPdyFNWCuwVT1bCBV5p3


>
>From: Tim Bell [tim.b...@cern.ch]
>Sent: Thursday, February 12, 2015 11:47 AM
>To: Daniele Venzano; openstack-operators@lists.openstack.org
>Subject: Re: [Openstack-operators] [Ceilometer] Real world experience   with   
> Ceilometer deployments - Feedback requested
>
>Does anyone have any proposals regarding
>
>> - Possible replacements for Ceilometer that you have used instead
>
>It seems that many sites have written their own systems. The stacktach/monasca 
>teams are due to demo to the operators meetup in Philadelphia  in March.
>
>Does anyone have experience to share comparing ceilometer with stacktach ?
>
>Tim

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Kris G. Lindgren
Event-based Monitoring & Billing solution for OpenStack

Unsure what its checking out for billing though.

 
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.



On 2/12/15, 9:17 AM, "Matt Joyce"  wrote:

>I thought stacktach was more in the vein of diagnostic.  Not billable
>resources. 
>
>On Feb 12, 2015 10:47 AM, Tim Bell  wrote:
>>
>> Does anyone have any proposals regarding
>>
>> > - Possible replacements for Ceilometer that you have used instead
>>
>> It seems that many sites have written their own systems. The
>>stacktach/monasca teams are due to demo to the operators meetup in
>>Philadelphia  in March.
>>
>> Does anyone have experience to share comparing ceilometer with
>>stacktach ? 
>>
>> Tim 
>>
>> > -Original Message-
>> > From: Daniele Venzano [mailto:daniele.venz...@eurecom.fr]
>> > Sent: 12 February 2015 12:24
>> > To: openstack-operators@lists.openstack.org
>> > Subject: Re: [Openstack-operators] [Ceilometer] Real world experience
>>with 
>> > Ceilometer deployments - Feedback requested
>> > 
>> > Unfortunately, I can only confirm the sorry state of Ceilometer.
>> > We tried it on a very small setup (6 compute nodes) and run in so
>>many issues, 
>> > we dropped it and created our own solution based on a mix of scripts
>>that read 
>> > from the nova/neutron DB, iptables and collectd data. No need for
>>more 
>> > collection agents than what we are already running for the systems
>>monitoring. 
>> > 
>> > We tried the version in Havana and, later, in Icehouse. For starters
>>the 
>> > documentation was suggesting MySQL as default backend. MySQL will
>>last just a 
>> > few days and then break down under the size of the tables. We tried
>>MongoDB, 
>> > but were still not satisfied with performance on such a small
>>cluster. 
>> > Then there is the metering agent. It is yet another daemon, not
>>integrated in 
>> > Neutron and there is no documentation about what it is actually
>>measuring. 
>> > What if I have multiple routers? Ingress and Egress? From which point
>>of view? 
>> > The same applies to Cinder, it requires and external agent (to be run
>>via cron!). 
>> > 
>> > Some metrics were not recorded, we couldn't understand why and,
>>again, no 
>> > documentation and no tooling to help us understand whether we were
>>just 
>> > missing some config options somewhere in nova-compute or there was
>>some 
>> > other problem with KVM/libvirt versions.
>> > And even when we had some data and wanted to generate just a
>>proof-of- 
>> > concept report with some information about tenant resource usage, we
>>found 
>> > problems with the API. The fact that no one had bothered to write a
>>simple 
>> > proof of concept script that uses the API to actually do something
>>useful was 
>> > really off-putting.
>> > 
>> > We had to dig in libvirt to understand what some of the metrics
>>actually mean. 
>> > We found that we could read those same metrics from our (more
>>efficient, well- 
>> > known) monitoring system.
>> > 
>> > For some time we run just the agents and aggregated the data in an
>> > elasticsearch instance through the UDP msgpack pipeline (more bugs,
>>message 
>> > format is inconsistent, different agents generate different fields,
>>in slightly 
>> > different formats).
>> > It works. But for our needs it was just too much work. Most of the
>>data is 
>> > already available from other sources with well-known APIs.
>> > 
>> > Ah, also there is a long standing bug open: Sahara and Ceilometer
>>cannot be 
>> > used together. And we use Sahara.
>> > 
>> > I opened bugs for some of these issues, but since then I lost
>>interest. 
>> > 
>> > In the end, I think it really depends on what kind of data you need
>>and what 
>> > (developer) resources you can throw at the problem.
>> > Unless in Juno things changed dramatically, Ceilometer will not work
>>out of the 
>> > box. You will have to lose time because of the non-existent
>>documentation, you
>> > will have to develop code and scripts anyway and finally you will
>>have to create 
>> > something between your billing system and the ceilometer API, because
>>to the 
>> > best of my know

Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Matt Joyce
I thought stacktach was more in the vein of diagnostic.  Not billable 
resources. 

On Feb 12, 2015 10:47 AM, Tim Bell  wrote:
>
> Does anyone have any proposals regarding 
>
> > - Possible replacements for Ceilometer that you have used instead 
>
> It seems that many sites have written their own systems. The 
> stacktach/monasca teams are due to demo to the operators meetup in 
> Philadelphia  in March. 
>
> Does anyone have experience to share comparing ceilometer with stacktach ? 
>
> Tim 
>
> > -Original Message- 
> > From: Daniele Venzano [mailto:daniele.venz...@eurecom.fr] 
> > Sent: 12 February 2015 12:24 
> > To: openstack-operators@lists.openstack.org 
> > Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with 
> > Ceilometer deployments - Feedback requested 
> > 
> > Unfortunately, I can only confirm the sorry state of Ceilometer. 
> > We tried it on a very small setup (6 compute nodes) and run in so many 
> > issues, 
> > we dropped it and created our own solution based on a mix of scripts that 
> > read 
> > from the nova/neutron DB, iptables and collectd data. No need for more 
> > collection agents than what we are already running for the systems 
> > monitoring. 
> > 
> > We tried the version in Havana and, later, in Icehouse. For starters the 
> > documentation was suggesting MySQL as default backend. MySQL will last just 
> > a 
> > few days and then break down under the size of the tables. We tried 
> > MongoDB, 
> > but were still not satisfied with performance on such a small cluster. 
> > Then there is the metering agent. It is yet another daemon, not integrated 
> > in 
> > Neutron and there is no documentation about what it is actually measuring. 
> > What if I have multiple routers? Ingress and Egress? From which point of 
> > view? 
> > The same applies to Cinder, it requires and external agent (to be run via 
> > cron!). 
> > 
> > Some metrics were not recorded, we couldn't understand why and, again, no 
> > documentation and no tooling to help us understand whether we were just 
> > missing some config options somewhere in nova-compute or there was some 
> > other problem with KVM/libvirt versions. 
> > And even when we had some data and wanted to generate just a proof-of- 
> > concept report with some information about tenant resource usage, we found 
> > problems with the API. The fact that no one had bothered to write a simple 
> > proof of concept script that uses the API to actually do something useful 
> > was 
> > really off-putting. 
> > 
> > We had to dig in libvirt to understand what some of the metrics actually 
> > mean. 
> > We found that we could read those same metrics from our (more efficient, 
> > well- 
> > known) monitoring system. 
> > 
> > For some time we run just the agents and aggregated the data in an 
> > elasticsearch instance through the UDP msgpack pipeline (more bugs, message 
> > format is inconsistent, different agents generate different fields, in 
> > slightly 
> > different formats). 
> > It works. But for our needs it was just too much work. Most of the data is 
> > already available from other sources with well-known APIs. 
> > 
> > Ah, also there is a long standing bug open: Sahara and Ceilometer cannot be 
> > used together. And we use Sahara. 
> > 
> > I opened bugs for some of these issues, but since then I lost interest. 
> > 
> > In the end, I think it really depends on what kind of data you need and 
> > what 
> > (developer) resources you can throw at the problem. 
> > Unless in Juno things changed dramatically, Ceilometer will not work out of 
> > the 
> > box. You will have to lose time because of the non-existent documentation, 
> > you 
> > will have to develop code and scripts anyway and finally you will have to 
> > create 
> > something between your billing system and the ceilometer API, because to 
> > the 
> > best of my knowledge there is nothing that uses it. 
> > 
> > eBay has the resources to do all that. We don't. 
> > 
> > 
> > 
> > -Original Message- 
> > From: George Shuklin [mailto:george.shuk...@gmail.com] 
> > Sent: Thursday 12 February 2015 02:59 
> > To: openstack-operators@lists.openstack.org 
> > Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with 
> > Ceilometer deployments - Feedback requested 
> > 
> > Ceilometer is in sad state. 
> > 
> > 1. Collector leaks memory. We ran it on same h

Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Tim Bell
Does anyone have any proposals regarding

> - Possible replacements for Ceilometer that you have used instead

It seems that many sites have written their own systems. The stacktach/monasca 
teams are due to demo to the operators meetup in Philadelphia  in March.

Does anyone have experience to share comparing ceilometer with stacktach ?

Tim 

> -Original Message-
> From: Daniele Venzano [mailto:daniele.venz...@eurecom.fr]
> Sent: 12 February 2015 12:24
> To: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with
> Ceilometer deployments - Feedback requested
> 
> Unfortunately, I can only confirm the sorry state of Ceilometer.
> We tried it on a very small setup (6 compute nodes) and run in so many issues,
> we dropped it and created our own solution based on a mix of scripts that read
> from the nova/neutron DB, iptables and collectd data. No need for more
> collection agents than what we are already running for the systems monitoring.
> 
> We tried the version in Havana and, later, in Icehouse. For starters the
> documentation was suggesting MySQL as default backend. MySQL will last just a
> few days and then break down under the size of the tables. We tried MongoDB,
> but were still not satisfied with performance on such a small cluster.
> Then there is the metering agent. It is yet another daemon, not integrated in
> Neutron and there is no documentation about what it is actually measuring.
> What if I have multiple routers? Ingress and Egress? From which point of view?
> The same applies to Cinder, it requires and external agent (to be run via 
> cron!).
> 
> Some metrics were not recorded, we couldn't understand why and, again, no
> documentation and no tooling to help us understand whether we were just
> missing some config options somewhere in nova-compute or there was some
> other problem with KVM/libvirt versions.
> And even when we had some data and wanted to generate just a proof-of-
> concept report with some information about tenant resource usage, we found
> problems with the API. The fact that no one had bothered to write a simple
> proof of concept script that uses the API to actually do something useful was
> really off-putting.
> 
> We had to dig in libvirt to understand what some of the metrics actually mean.
> We found that we could read those same metrics from our (more efficient, well-
> known) monitoring system.
> 
> For some time we run just the agents and aggregated the data in an
> elasticsearch instance through the UDP msgpack pipeline (more bugs, message
> format is inconsistent, different agents generate different fields, in 
> slightly
> different formats).
> It works. But for our needs it was just too much work. Most of the data is
> already available from other sources with well-known APIs.
> 
> Ah, also there is a long standing bug open: Sahara and Ceilometer cannot be
> used together. And we use Sahara.
> 
> I opened bugs for some of these issues, but since then I lost interest.
> 
> In the end, I think it really depends on what kind of data you need and what
> (developer) resources you can throw at the problem.
> Unless in Juno things changed dramatically, Ceilometer will not work out of 
> the
> box. You will have to lose time because of the non-existent documentation, you
> will have to develop code and scripts anyway and finally you will have to 
> create
> something between your billing system and the ceilometer API, because to the
> best of my knowledge there is nothing that uses it.
> 
> eBay has the resources to do all that. We don't.
> 
> 
> 
> -Original Message-----
> From: George Shuklin [mailto:george.shuk...@gmail.com]
> Sent: Thursday 12 February 2015 02:59
> To: openstack-operators@lists.openstack.org
> Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with
> Ceilometer deployments - Feedback requested
> 
> Ceilometer is in sad state.
> 
> 1. Collector leaks memory. We ran it on same host with mongo, and it grab
> 29Gb out of 32, leaving mongo with less than gig memory available.
> 2. Metering agent cause huge load on neutron-server. o(n) of metering rules 
> and
> tenants. Few bugs reported, one bugfix in review.
> 3. Metering agent simply do no work on multi-network-nodes installation.
> It exepects all routers be on same host. Fixed or not - I don't know, we have 
> our
> own crude fix.
> 4. Many rough edges. Ceilometer much less tested than nova. Sometimes it
> traces and skip counting. Fresh example: if metadata has '.' in the name,
> ceilometer trace on it and did not count in glance usage.
> 5. Very slow on reports (using mongo's mapreduce).
> 
> Overall feel

Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-12 Thread Daniele Venzano
Unfortunately, I can only confirm the sorry state of Ceilometer.
We tried it on a very small setup (6 compute nodes) and run in so many issues, 
we dropped it and created our own solution based on a mix of scripts that read 
from the nova/neutron DB, iptables and collectd data. No need for more 
collection agents than what we are already running for the systems monitoring.

We tried the version in Havana and, later, in Icehouse. For starters the 
documentation was suggesting MySQL as default backend. MySQL will last just a 
few days and then break down under the size of the tables. We tried MongoDB, 
but were still not satisfied with performance on such a small cluster.
Then there is the metering agent. It is yet another daemon, not integrated in 
Neutron and there is no documentation about what it is actually measuring. What 
if I have multiple routers? Ingress and Egress? From which point of view?
The same applies to Cinder, it requires and external agent (to be run via 
cron!).

Some metrics were not recorded, we couldn't understand why and, again, no 
documentation and no tooling to help us understand whether we were just missing 
some config options somewhere in nova-compute or there was some other problem 
with KVM/libvirt versions.
And even when we had some data and wanted to generate just a proof-of-concept 
report with some information about tenant resource usage, we found problems 
with the API. The fact that no one had bothered to write a simple proof of 
concept script that uses the API to actually do something useful was really 
off-putting.

We had to dig in libvirt to understand what some of the metrics actually mean.
We found that we could read those same metrics from our (more efficient, 
well-known) monitoring system.

For some time we run just the agents and aggregated the data in an 
elasticsearch instance through the UDP msgpack pipeline (more bugs, message 
format is inconsistent, different agents generate different fields, in slightly 
different formats).
It works. But for our needs it was just too much work. Most of the data is 
already available from other sources with well-known APIs.

Ah, also there is a long standing bug open: Sahara and Ceilometer cannot be 
used together. And we use Sahara.

I opened bugs for some of these issues, but since then I lost interest.

In the end, I think it really depends on what kind of data you need and what 
(developer) resources you can throw at the problem.
Unless in Juno things changed dramatically, Ceilometer will not work out of the 
box. You will have to lose time because of the non-existent documentation, you 
will have to develop code and scripts anyway and finally you will have to 
create something between your billing system and the ceilometer API, because to 
the best of my knowledge there is nothing that uses it.

eBay has the resources to do all that. We don't.



-Original Message-
From: George Shuklin [mailto:george.shuk...@gmail.com] 
Sent: Thursday 12 February 2015 02:59
To: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with 
Ceilometer deployments - Feedback requested

Ceilometer is in sad state.

1. Collector leaks memory. We ran it on same host with mongo, and it grab 29Gb 
out of 32, leaving mongo with less than gig memory available.
2. Metering agent cause huge load on neutron-server. o(n) of metering rules and 
tenants. Few bugs reported, one bugfix in review.
3. Metering agent simply do no work on multi-network-nodes installation. 
It exepects all routers be on same host. Fixed or not - I don't know, we have 
our own crude fix.
4. Many rough edges. Ceilometer much less tested than nova. Sometimes it traces 
and skip counting. Fresh example: if metadata has '.' in the name, ceilometer 
trace on it and did not count in glance usage.
5. Very slow on reports (using mongo's mapreduce).

Overall feeling: barely usable, but with my experience with cloud billings, not 
the worst thing I saw in my life.

About load: except reporting and memory leaks, it use rather small amount of 
resources.

On 02/11/2015 09:37 PM, Maish Saidel-Keesing wrote:
> Is Ceilometer ready for prime time?
>
> I would be interested in hearing from people who have deployed 
> OpenStack clouds with Ceilometer, and their experience. Some of the 
> topics I am looking for feedback on are:
>
> - Database Size
> - MongoDB management, Sharding, replica sets etc.
> - Replication strategies
> - Database backup/restore
> - Overall useability
> - Gripes, pains and problems (things to look out for)
> - Possible replacements for Ceilometer that you have used instead
>
>
> If you are willing to share - I am sure it will be beneficial to the 
> whole community.
>
> Thanks in Advance
>
>
> With best regards,
>
>
> Maish Saidel-Keesing
> Platform Architect
> Cisco
>
>
>
>
>

Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-11 Thread Zeng, Bryant
@Maish Saidel-Keesing,

Hi Maish, I’m from eBay Inc, and we’re enabling 1000+ ceilometer compute 
agents. Hope our experience could help.

We choose an OpenTSDB backend instead of MongoDB in the first place, so we 
avoid of most of the issues related to MongoDB.

However, during deployment, we still met many issues as below:

  1.  The inspector of libvirt didn’t work in nova-cell mode. We fixed it by 
using instance uuid to identify vm, and submitted to upstream. 
(https://bugs.launchpad.net/ceilometer/+bug/1396473)
  2.  There’s huge load to nova/glance client that even drag them down. We 
resolved it in 3 ways as below to reduce the load:
 *   Shuffle compute agents triggering time to avoid same time requests to 
nova client., and it’s already got approved and merged in upstream. 
(https://bugs.launchpad.net/ceilometer/+bug/1412613)
 *   Add cache layer for nova discovery results of instances, this would 
reduce quite a lot queries to nova client. It’s still in discussing with 
upstream. (https://review.openstack.org/#/c/153503/)
 *   Remove flavor and image query for vm since we didn't need the info now.

Our original thinking about MongoDB is to only store some metadata definition, 
and put most other metrics to a time series db.

So all in all, we think probably you can consider to change your main storage 
backend MongoDB, and that may improve your Ceilometer performance.
Also some performance related code enhance/modification based on your 
conditions would be better.

Thanks,
Bryant(Cloud Team, eBay Inc)

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-11 Thread George Shuklin

Ceilometer is in sad state.

1. Collector leaks memory. We ran it on same host with mongo, and it 
grab 29Gb out of 32, leaving mongo with less than gig memory available.
2. Metering agent cause huge load on neutron-server. o(n) of metering 
rules and tenants. Few bugs reported, one bugfix in review.
3. Metering agent simply do no work on multi-network-nodes installation. 
It exepects all routers be on same host. Fixed or not - I don't know, we 
have our own crude fix.
4. Many rough edges. Ceilometer much less tested than nova. Sometimes it 
traces and skip counting. Fresh example: if metadata has '.' in the 
name, ceilometer trace on it and did not count in glance usage.

5. Very slow on reports (using mongo's mapreduce).

Overall feeling: barely usable, but with my experience with cloud 
billings, not the worst thing I saw in my life.


About load: except reporting and memory leaks, it use rather small 
amount of resources.


On 02/11/2015 09:37 PM, Maish Saidel-Keesing wrote:

Is Ceilometer ready for prime time?

I would be interested in hearing from people who have deployed 
OpenStack clouds with Ceilometer, and their experience. Some of the 
topics I am looking for feedback on are:


- Database Size
- MongoDB management, Sharding, replica sets etc.
- Replication strategies
- Database backup/restore
- Overall useability
- Gripes, pains and problems (things to look out for)
- Possible replacements for Ceilometer that you have used instead


If you are willing to share - I am sure it will be beneficial to the 
whole community.


Thanks in Advance


With best regards,


Maish Saidel-Keesing
Platform Architect
Cisco




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-11 Thread Maish Saidel-Keesing

Is Ceilometer ready for prime time?

I would be interested in hearing from people who have deployed OpenStack 
clouds with Ceilometer, and their experience. Some of the topics I am 
looking for feedback on are:


- Database Size
- MongoDB management, Sharding, replica sets etc.
- Replication strategies
- Database backup/restore
- Overall useability
- Gripes, pains and problems (things to look out for)
- Possible replacements for Ceilometer that you have used instead


If you are willing to share - I am sure it will be beneficial to the 
whole community.


Thanks in Advance


With best regards,


Maish Saidel-Keesing
Platform Architect
Cisco




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators