Hi Jeroen, On Tue, Aug 22, 2017 at 12:27 PM, Jeroen Baten <[email protected]> wrote:
In case you mist my earlier reply (lost at the bottom of an old thread > in your email client :-) ) I repost it like this. > Good point, I still owe you a response on that thread. Here it comes :) Having thought about devops and monitoring I must admit that I am not > happy about where it was heading. > > I love LPI's generic and practical approach so I spend some time about > that regarding devops and monitoring > > Yes, a devops guy needs to know about monitoring. > Yes, he should know that there are a few popular open source projects > that do monitoring: Nagios, Icinga, Zabbix, Prometheus (if you must > insist allthough I think it is not nearly mature enough). > > No, he should not become an expert in one of these packages. > (well, I could say it must be Zabbix but 10 to 1 somebody will see that > completely differently) > We had some opinions whether or not to test a specific product / project. The problem with concepts is that they are hard to test. Examples ease this a lot because they avoid long verbal explanations. Specifying a specific tool might also provide guidance for candidates who are new to a topic because it gives them a point to start their study (and maybe learn enough to pick another solution that better serves their needs). Take email servers as an example; one might try to test email delivery on a conceptual level only, but (for very good reasons) we're testing Postfix in LPIC-2. In fact, we used to test several MTAs in former times. After all, we have to find the right balance between having some meat on the bones (by having examples for the concepts we test), being useful (to those who use the objectives to learn new topic) and being efficient (to those who know a different tool and prepare for the exam). All this has to be decided from the candidate's perspective. For the DevOps Tools Engineer exam, this candidate's focus are microservices in a dynamic environment where new containers / VMs are spawned automatically, potentially in a very high frequency, potentially triggered by some automatism. This has significant influence no how monitoring works. Keeping track of a dynamic environment requires a monitoring system to use some kind of service discovery. Furthermore, tools like Kubernetes can detect the failure of a container/pod and restart it automatically. Monitoring the old pod wouldn't be a great benefit; instead, the amount of container failures or pod restarts might be better indicator to find problems; i.e. since the failure of a single container/pod might not affect the overall availability of a service. This shifts the interest from a single server/container to services and from simple up/down to more detailed metrics. The main reason why we reconsidered Icinga2 were these requirements and how easy it is to fulfill them. > But we can tell students about things like: > -Be aware of sizing. The amount of monitorin information is the number > of items times the number of servers. > Not necessarily. It could also be the general availability of a service no matter how many (virtual/containerized) servers provide the service. It might also be the overall rate of failing requests, the overall number of certain API calls, the overall number of available processing nodes, the number of container restarts... . > -Know the difference between storing in a rrd database or a sql database > or elastic database and the difference in housekeeping. > Here we run into the same problems as mentioned above, in a general approach questions on these topics can easily become vague while using a specific example requires us to make a choice. > -The sort-of standard way how return/errorlevels are organised: > > Nagios/Icinga: > Plugin Return Code Service State Host State > 0 OK UP > 1 WARNING UP or DOWN/UNREACHABLE* > 2 CRITICAL DOWN/UNREACHABLE > 3 UNKNOWN DOWN/UNREACHABLE > > Zabbix: Any exit code that is different from 0 is considered as > execution failure. > > Prometheus:? (couldn't find it, pointers welcome) > Short answer: There is the "up" time series which might be an initial indicator. Longer answer: What defines a warning / critical / ok state of a service or an application? A lot of these definitions stem from metrics. Nagios and icinga allow us to procure performance data, but they are pretty static in how they interpret them (basically warning / critical thresholds). Prometheus also can collect multiple metrics and can be configured to alert on thresholds. For Nagios / Icinga(2) storing performance data over a longer period of time requires additional helpers; I heard the cool kids use datastores like InfluxDB or Graphite and dashboards like Grafana -- which basically ends up with the same dashboards Prometheus creates. The result for what people want when monitoring their microservices seems to be pretty similar, although Prometheus seems to be the easier way to get there. That doesn't mean there is no use case for Nagios / Icinga(2), given the context of the DevOps Tools Engineer exam Prometheus seems to be the better fit. I know this is arguable, but I hope this makes the motivation of the change to Prometheus a little more transparent. Feel free to follow up :) Fabian PS: For those of you who like numbers: curl -s https://hub.docker.com/v2/repositories/prom/prometheus/ | jq '{ pull_count }' Op 17-08-17 om 15:55 schreef Jeroen Baten: > > > > I thought I'd give Prometheus a try. > > I really don't understand the enthusiasm. > > All I can see is that the agent sends data to the server and I can get a > > graph for the data. > > > > Maybe I am mistaken but I can't see things like templates > > (pre-configured lists of triggers and data) like in Zabbix, or how to > > configure alerts. > > > > And if I want a prometheus dashboard I have to install Rails and install > > PromDash. But even than I have to add all the metrics that I need to > > monitor. > > > > I want to be able to daily add servers from a cmdb to my monitoring > > solution and attach some templates. > > I want to not be bothered by metrics unless something goes wrong. > > > > AFAICT Prometheus is nice for small scale projects but that's it. > > > > Am I maybe missing something? > > > > regards, > > Jeroen > > > > -- > Jeroen Baten | EMAIL : [email protected] > ____ _ __ | web : www.i2rs.nl > | )|_)(_ | tel : +31 (0)345 - 75 26 28 > _|_/_| \__) | Molenwindsingel 46, 4105 HK, Culemborg, the > Netherlands > _______________________________________________ > lpi-examdev mailing list > [email protected] > http://list.lpi.org/cgi-bin/mailman/listinfo/lpi-examdev >
_______________________________________________ lpi-examdev mailing list [email protected] http://list.lpi.org/cgi-bin/mailman/listinfo/lpi-examdev
