Re: [lpi-examdev] monitoring for devops

Fabian Thorns Sun, 27 Aug 2017 15:40:04 -0700

Hi Jeroen,

On Tue, Aug 22, 2017 at 12:27 PM, Jeroen Baten <[email protected]> wrote:


In case you mist my earlier reply (lost at the bottom of an old thread
> in your email client :-) ) I repost it like this.
>

Good point, I still owe you a response on that thread. Here it comes :)


Having thought about devops and monitoring I must admit that I am not
> happy about where it was heading.
>
> I love LPI's generic and practical approach so I spend some time about
> that regarding devops and monitoring
>
> Yes, a devops guy needs to know about monitoring.
> Yes, he should know that there are a few popular open source projects
> that do monitoring: Nagios, Icinga, Zabbix, Prometheus (if you must
> insist allthough I think it is not nearly mature enough).
>
> No, he should not become an expert in one of these packages.
> (well, I could say it must be Zabbix but 10 to 1 somebody will see that
> completely differently)
>

We had some opinions whether or not to test a specific product / project.
The problem with concepts is that they are hard to test. Examples ease this
a lot because they avoid long verbal explanations. Specifying a specific
tool might also provide guidance for candidates who are new to a topic
because it gives them a point to start their study (and maybe learn enough
to pick another solution that better serves their needs). Take email
servers as an example; one might try to test email delivery on a conceptual
level only, but (for very good reasons) we're testing Postfix in LPIC-2. In
fact, we used to test several MTAs in former times.

After all, we have to find the right balance between having some meat on
the bones (by having examples for the concepts we test), being useful (to
those who use the objectives to learn new topic) and being efficient (to
those who know a different tool and prepare for the exam). All this has to
be decided from the candidate's perspective.

For the DevOps Tools Engineer exam, this candidate's focus are
microservices in a dynamic environment where new containers / VMs are
spawned automatically, potentially in a very high frequency, potentially
triggered by some automatism. This has significant influence no how
monitoring works. Keeping track of a dynamic environment requires a
monitoring system to use some kind of service discovery. Furthermore, tools
like Kubernetes can detect the failure of a container/pod and restart it
automatically. Monitoring the old pod wouldn't be a great benefit; instead,
the amount of container failures or pod restarts might be better indicator
to find problems; i.e. since the failure of a single container/pod might
not affect the overall availability of a service. This shifts the interest
from a single server/container to services and from simple up/down to more
detailed metrics. The main reason why we reconsidered Icinga2 were these
requirements and how easy it is to fulfill them.



> But we can tell students about things like:
> -Be aware of sizing. The amount of monitorin information is the number
> of items times the number of servers.
>

Not necessarily. It could also be the general availability of a service no
matter how many (virtual/containerized) servers provide the service. It
might also be the overall rate of failing requests, the overall number of
certain API calls, the overall number of available processing nodes, the
number of container restarts... .



> -Know the difference between storing in a rrd database or a sql database
> or elastic database and the difference in housekeeping.
>

Here we run into the same problems as mentioned above, in a general
approach questions on these topics can easily become vague while using a
specific example requires us to make a choice.



> -The sort-of standard way how return/errorlevels are organised:
>
> Nagios/Icinga:
> Plugin Return Code      Service State   Host State
> 0       OK      UP
> 1       WARNING UP or DOWN/UNREACHABLE*
> 2       CRITICAL        DOWN/UNREACHABLE
> 3       UNKNOWN DOWN/UNREACHABLE
>
> Zabbix: Any exit code that is different from 0 is considered as
> execution failure.
>
> Prometheus:? (couldn't find it, pointers welcome)
>

Short answer: There is the "up" time series which might be an initial
indicator.

Longer answer: What defines a warning / critical / ok state of a service or
an application? A lot of these definitions stem from metrics. Nagios and
icinga allow us to procure performance data, but they are pretty static in
how they interpret them (basically warning / critical thresholds).
Prometheus also can collect multiple metrics and can be configured to alert
on thresholds. For Nagios / Icinga(2) storing performance data over a
longer period of time requires additional helpers; I heard the cool kids
use datastores like InfluxDB or Graphite and dashboards like Grafana --
which basically ends up with the same dashboards Prometheus creates. The
result for what people want when monitoring their microservices seems to be
pretty similar, although Prometheus seems to be the easier way to get
there. That doesn't mean there is no use case for Nagios / Icinga(2), given
the context of the DevOps Tools Engineer exam Prometheus seems to be the
better fit.

I know this is arguable, but I hope this makes the motivation of the change
to Prometheus a little more transparent. Feel free to follow up :)

Fabian

PS: For those of you who like numbers: curl -s
https://hub.docker.com/v2/repositories/prom/prometheus/ | jq '{ pull_count
}'



Op 17-08-17 om 15:55 schreef Jeroen Baten:
> >
> > I thought I'd give Prometheus a try.
> > I really don't understand the enthusiasm.
> > All I can see is that the agent sends data to the server and I can get a
> > graph for the data.
> >
> > Maybe I am mistaken but I can't see things like templates
> > (pre-configured lists of triggers and data) like in Zabbix, or how to
> > configure alerts.
> >
> > And if I want a prometheus dashboard I have to install Rails and install
> > PromDash. But even than I have to add all the metrics that I need to
> > monitor.
> >
> > I want to be able to daily add servers from a cmdb to my monitoring
> > solution and attach some templates.
> > I want to not be bothered by metrics unless something goes wrong.
> >
> > AFAICT Prometheus is nice for small scale projects but that's it.
> >
> > Am I maybe missing something?
> >
> > regards,
> > Jeroen
> >
>
> --
> Jeroen Baten              | EMAIL :  [email protected]
>   ____  _  __              | web   :  www.i2rs.nl
>    |  )|_)(_               | tel   :  +31 (0)345 - 75 26 28
>   _|_/_| \__)              | Molenwindsingel 46, 4105 HK, Culemborg, the
> Netherlands
> _______________________________________________
> lpi-examdev mailing list
> [email protected]
> http://list.lpi.org/cgi-bin/mailman/listinfo/lpi-examdev
>

_______________________________________________
lpi-examdev mailing list
[email protected]
http://list.lpi.org/cgi-bin/mailman/listinfo/lpi-examdev

Re: [lpi-examdev] monitoring for devops

Reply via email to