Puhh, that is is going to be endless :D
I want to see: Operating System / Hardware: system load, cpu user, cpu system, memory usage, swap, swap activity, user/free disk space, nic bw, nic packets, disk bw, disk iops, zombie procs, hdd smart, RAID status Docker: cpu / memory per container, volume usage OpenStack: # free FIPs, # instances, # volumes, # networks, # routers, # networks HAproxy: # sessions per listener, http status codes per service RabbitMQ: message rates, # messages in queue Ceph: placement group states, available capacity, pool bw / iops, disk latency, journal latency I want alerts for: - node goes down - ceph: mon goes down, osd goes down, pgs stuck for more than X seconds - service goes down - container up/down - mariadb/galera: cluster health - rabbitmq: cluster health That's just from the top of my head :) The reason why I don't want alerts on everything is that most solutions work with static thresholds which is mostly useless. I prefer walking through my dashboards every morning checking stuff myself. cheers, Mathias Am 24.07.2016 17:16 schrieb "Michał Jastrzębski" <inc...@gmail.com>: > Guys, thanks for all that! > > Can we for a second abstract this discussion from technology and start > by lining up scenerios we want to achieve. Then put a software that > will allow us to achieve all/most of scenerios with least amount of > work/maintenance? > > So my scenerios: > > I want to see when health of docker service > I want to see when message queue becomes saturated > I want to see when RAM exceeds 70% > I want to see when my network causes tons of retransmissions > I want to see when one of nodes is down > > Did I miss anything? Which software stack would allow me to see these > things? > > Cheers, > Michal > > On 24 July 2016 at 09:09, Mathias Ewald <mew...@evoila.de> wrote: > > I think Sensu is the best monitoring approach out there atm. Nagios / > Icinga > > are way to static and scale badly imho. The kind of checks you proposed > are > > quite interesting. I would suggest to wrap a sensu check around Tempest > but > > that's going to far for the first cycle. > > > > The two stacks (Sensu + Unchiwa and TICK) only really overlap in metrics > > collection which can be done via Sensu and Telegraf. I don't know if it > > makes sense to have both ... I definitely think we need Sensu though > simply > > to monitor service availability and other thresholds and events which > aren't > > covered in TICK as not everything is time series data and to have the > > alerting. Only with Sensu we don't have insight into performance and > trends, > > with TICK only we lack alerting on events and non-performance metric data > > (Is Keystone up? etc) > > > > I think it won't hurt to develop theses two stacks in parallel and maybe > > we'll join them together in a chain as I described earlier. > > > > 2016-07-24 14:25 GMT+02:00 Dave Walker <em...@daviey.com>: > >> > >> Thanks Mathias, > >> > >> I'm not tied to Sensu.. anything can really fill that gap in my mind. > >> You've done a good job at outlining the steps involved. I created a > >> blueprint with the steps I had in mind[0] > >> > >> For this cycle, I wanted to keep it simple so it was easily > achievable. I > >> only planned to have some basic up/down for each node and throw the > >> performance data on the floor. > >> > >> I wanted to include the option to include local configs, as json blobs. > >> Some of the things I was thinking as local config: > >> - daily checkouts, can instances be built with networking > >> - remaining resources count (ie, does each subnet have X remaining ip > >> addresses available) > >> - Is Ceph healthy? > >> > >> So, these things aren't really performance over time interesting.. which > >> means the intention does differ. However, I do agree that both stacks > could > >> achieve both objectives. > >> > >> I've essentially got much of this working locally, but would require > about > >> a day of cleaning up for submission... but if your work can achieve the > >> objectives above, i'm happy to discontinue... or help make your stack > >> pluggable. > >> > >> [0] https://blueprints.launchpad.net/kolla/+spec/sensu > >> > >> -- > >> Kind Regards, > >> Dave Walker > >> > >> On 24 July 2016 at 11:56, Mathias Ewald <mew...@evoila.de> wrote: > >>> > >>> Monitoring is a difficult topic as the number of options regarding the > >>> toolset and mechanisms are very high. We had some chats about it in > IRC that > >>> discovered even more options than I thought existed :D I believe > Dave's view > >>> on Sensu is generally correct in that Sensu is more directed to > monitoring > >>> in the form of "if X running/working" but of course has the ability to > >>> transport metrics, too, but lacks the good dashboarding capabilities > for > >>> performance data. One set up I could images is > >>> > >>> 1. Sensu Client to collect checks and metrics > >>> 2. RabbitMQ for transport > >>> 3. Sensu Server to receive, evaluate, alarm and write metrics to > InfluxDB > >>> 4. Uchiwa as a Dashboard to Sensu > >>> 5. InfluxDB to store metrics > >>> 6. Grafana to dashboard metrics > >>> > >>> So Sensu could be used as a replacement for (or in addition to) a > metrics > >>> collection daemon like Collectd or what I decided to use: Telegraf. > For my > >>> implementation, this means I will add a parameter to make Telegraf > optional. > >>> This way, someone else may implement the rest of the stack and the > user can > >>> decide which one to use. > >>> > >>> What do you think? > >>> > >>> Mathias > >>> > >>> > >>> > >>> 2016-07-23 21:51 GMT+02:00 Stephen Hindle <shin...@llnw.com>: > >>>> > >>>> My understanding was Sensu could produce metrics ? > >>>> And Kapacitor can do alerting for the TICK stack stuff mewald is > >>>> doing... > >>>> I really don't see them as that different ? > >>>> > >>>> > >>>> On Fri, Jul 22, 2016 at 5:19 PM, Dave Walker <em...@daviey.com> > wrote: > >>>> > Yes, this is my thought. > >>>> > > >>>> > The scope of the Sensu work is: "Is this thing working?" (with the > >>>> > reference > >>>> > being up/down) > >>>> > But the scope of the Grafana and friends is, "How hard is this > >>>> > working?" > >>>> > (but no alerting) > >>>> > > >>>> > They are certainly complementary.... However, Sensu can throw data > at > >>>> > a > >>>> > Grafana stack (aiui).. but I fear that is too much to achieve this > >>>> > cycle. > >>>> > > >>>> > -- > >>>> > Kind Regards, > >>>> > Dave Walker > >>>> > > >>>> > On 23 July 2016 at 00:11, Fox, Kevin M <kevin....@pnnl.gov> wrote: > >>>> >> > >>>> >> I think those are two different, complementary things. > >>>> >> > >>>> >> One's metrics and the other is monitoring. You probably want both > at > >>>> >> the > >>>> >> same time. > >>>> >> > >>>> >> Thanks, > >>>> >> Kevin > >>>> >> ________________________________________ > >>>> >> From: Steven Dake (stdake) [std...@cisco.com] > >>>> >> Sent: Friday, July 22, 2016 3:52 PM > >>>> >> To: OpenStack Development Mailing List (not for usage questions) > >>>> >> Subject: Re: [openstack-dev] [kolla] Monitoring tooling > >>>> >> > >>>> >> Thanks for pointing that out. Brain out to lunch today it appears > :( > >>>> >> > >>>> >> I think choices are a good thing even though they increase our > >>>> >> implementation footprint. Anyone opposed to implementing both with > >>>> >> something in globals.yml like > >>>> >> monitoring: grafana or > >>>> >> monitoring: sensu > >>>> >> > >>>> >> Comments questions or concerns welcome. > >>>> >> > >>>> >> Regards > >>>> >> -steve > >>>> >> > >>>> >> On 7/22/16, 3:42 PM, "Stephen Hindle" <shin...@llnw.com> wrote: > >>>> >> > >>>> >> >Don't forget mewalds implementation as well - we now have 2 > >>>> >> > monitoring > >>>> >> >options for kolla :-) > >>>> >> > > >>>> >> >On Fri, Jul 22, 2016 at 3:15 PM, Steven Dake (stdake) > >>>> >> > <std...@cisco.com> > >>>> >> >wrote: > >>>> >> >> Hi folks, > >>>> >> >> > >>>> >> >> At the midcycle we decided to push off implementing Monitoring > >>>> >> >> until > >>>> >> >>post > >>>> >> >> Newton. The rationale for this decision was that the core > review > >>>> >> >> team > >>>> >> >>has > >>>> >> >> enough on their plates and nobody was super keen to implement > any > >>>> >> >>monitoring > >>>> >> >> solution given our other priorities. > >>>> >> >> > >>>> >> >> Like all good things, communities produce new folks that want to > >>>> >> >> do new > >>>> >> >> things, and Sensu was proposed as Kolla's monitoring solution > >>>> >> >> (atleast > >>>> >> >>the > >>>> >> >> first one). A developer that has done some good work has shown > up > >>>> >> >> to > >>>> >> >>do the > >>>> >> >> job as well :) I have heard good things about Sensu, minus the > >>>> >> >> fact > >>>> >> >>that it > >>>> >> >> is implemented in Ruby and I fear it may end up causing our > gate a > >>>> >> >> lot > >>>> >> >>of > >>>> >> >> hassle. > >>>> >> >> > >>>> >> >> https://review.openstack.org/#/c/341861/ > >>>> >> >> > >>>> >> >> > >>>> >> >> Anyway I think we can work through the gate problem. > >>>> >> >> > >>>> >> >> Does anyone have any better suggestion? I'd like to unblock > >>>> >> >> Dave's > >>>> >> >> work > >>>> >> >> which is blocked on a 2 pending a complete discussion of our > >>>> >> >> monitoring > >>>> >> >> solution. Note we may end up implementing more than one down > the > >>>> >> >> road > >>>> >> >> > >>>> >> >> Sensu is just where the original interest was. > >>>> >> >> > >>>> >> >> Please provide feedback, even if you don't have a preference, > >>>> >> >> whether > >>>> >> >>your a > >>>> >> >> core reviewer or not. > >>>> >> >> > >>>> >> >> My take is we can merge this work in non-prioirty order, and if > it > >>>> >> >>makes the > >>>> >> >> end of the cycle fantastic if not, we can release it in > Ocatta. > >>>> >> >> > >>>> >> >> Regards > >>>> >> >> -steve > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> > >>>> >> >> > >>>> >> >> >> > >>_________________________________________________________________________ > >>>> >> >>_ > >>>> >> >> OpenStack Development Mailing List (not for usage questions) > >>>> >> >> Unsubscribe: > >>>> >> >>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>>> >> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> >> >> > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> >-- > >>>> >> >Stephen Hindle - Senior Systems Engineer > >>>> >> >480.807.8189 480.807.8189 > >>>> >> >www.limelight.com Delivering Faster Better > >>>> >> > > >>>> >> >Join the conversation > >>>> >> > > >>>> >> >at Limelight Connect > >>>> >> > > >>>> >> >-- > >>>> >> >The information in this message may be confidential. It is > intended > >>>> >> >solely > >>>> >> >for > >>>> >> >the addressee(s). If you are not the intended recipient, any > >>>> >> > disclosure, > >>>> >> >copying or distribution of the message, or any action or omission > >>>> >> > taken > >>>> >> >by > >>>> >> >you > >>>> >> >in reliance on it, is prohibited and may be unlawful. Please > >>>> >> > immediately > >>>> >> >contact the sender if you have received this message in error. > >>>> >> > > >>>> >> > > >>>> >> > >>>> >> > > >>>> >> > > > >__________________________________________________________________________ > >>>> >> >OpenStack Development Mailing List (not for usage questions) > >>>> >> >Unsubscribe: > >>>> >> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>>> >> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> >> > >>>> >> > >>>> >> > >>>> >> > __________________________________________________________________________ > >>>> >> OpenStack Development Mailing List (not for usage questions) > >>>> >> Unsubscribe: > >>>> >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> >> > >>>> >> > >>>> >> > __________________________________________________________________________ > >>>> >> OpenStack Development Mailing List (not for usage questions) > >>>> >> Unsubscribe: > >>>> >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > __________________________________________________________________________ > >>>> > OpenStack Development Mailing List (not for usage questions) > >>>> > Unsubscribe: > >>>> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> > > >>>> > >>>> > >>>> > >>>> -- > >>>> Stephen Hindle - Senior Systems Engineer > >>>> 480.807.8189 480.807.8189 > >>>> www.limelight.com Delivering Faster Better > >>>> > >>>> Join the conversation > >>>> > >>>> at Limelight Connect > >>>> > >>>> -- > >>>> The information in this message may be confidential. It is intended > >>>> solely > >>>> for > >>>> the addressee(s). If you are not the intended recipient, any > >>>> disclosure, > >>>> copying or distribution of the message, or any action or omission > taken > >>>> by > >>>> you > >>>> in reliance on it, is prohibited and may be unlawful. Please > >>>> immediately > >>>> contact the sender if you have received this message in error. > >>>> > >>>> > >>>> > >>>> > __________________________________________________________________________ > >>>> OpenStack Development Mailing List (not for usage questions) > >>>> Unsubscribe: > >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>> > >>> > >>> > >>> > >>> -- > >>> Mobil: +49 176 10567592 > >>> E-Mail: mew...@evoila.de > >>> > >>> evoila GmbH > >>> Wilhelm-Theodor-Römheld-Str. 34 > >>> 55130 Mainz > >>> Germany > >>> > >>> Geschäftsführer: Johannes Hiemer > >>> > >>> Amtsgericht Mainz HRB 42719 > >>> > >>> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte > >>> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese > E-Mail > >>> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender > und > >>> vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte > >>> Weitergabe dieser Mail ist nicht gestattet. > >>> > >>> This e-mail may contain confidential and/or privileged information. If > >>> You are not the intended recipient (or have received this e-mail in > error) > >>> please notify the sender immediately and destroy this e-mail. Any > >>> unauthorised copying, disclosure or distribution of the material in > this > >>> e-mail is strictly forbidden. > >>> > >>> > >>> > __________________________________________________________________________ > >>> OpenStack Development Mailing List (not for usage questions) > >>> Unsubscribe: > >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>> > >> > >> > >> > __________________________________________________________________________ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > > > > > > -- > > Mobil: +49 176 10567592 > > E-Mail: mew...@evoila.de > > > > evoila GmbH > > Wilhelm-Theodor-Römheld-Str. 34 > > 55130 Mainz > > Germany > > > > Geschäftsführer: Johannes Hiemer > > > > Amtsgericht Mainz HRB 42719 > > > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte > > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese > E-Mail > > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und > > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte > > Weitergabe dieser Mail ist nicht gestattet. > > > > This e-mail may contain confidential and/or privileged information. If > You > > are not the intended recipient (or have received this e-mail in error) > > please notify the sender immediately and destroy this e-mail. Any > > unauthorised copying, disclosure or distribution of the material in this > > e-mail is strictly forbidden. > > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev