Re: [prometheus-users] Mount Point Missing Alarm

Julius Volz Mon, 01 Mar 2021 07:44:00 -0800

Hi Saurabh,

For any calculations where you compare the current state to the past state
as a correctness check (where the past state represents the desired /
expected state), you always have some limitations: First, it's already
possible that mountpoints are missing before you even start collecting
data, in which case you would never be able to notice those missing mount
points. Second, Prometheus is a sliding window system, so any reference of
the current to the past will "slide" over your data, and eventually your
current state will become the past, whether it's in the originally desired
state or not (thus you will stop noticing problems at that time / alerts
will stop firing). For example, you can compare the current set of
mountpoints to the set 10 minutes ago, and you can get an alert if some
mountpoint went missing. But if you wait another 10 minutes, then when the
alert calculation runs again, both the current and old state used for
comparisons will no longer contain the now-missing mountpoints, so the
alert would stop firing.


Still, given those caveats, you could write an alert expression like this:

    node_filesystem_readonly offset 1h
unless
    node_filesystem_readonly

This basically says "alert me if there was a filesystem 1h ago, unless it
is also currently present ". But beware that this alert will auto-resolve
after 1 hour, due to the sliding window effect described above. So to be
somewhat more resilient, you could increase the 1h to 1d or something. But
be aware that for the alert to work, you need at least 1d of history in
your TSDB then, so in a fresh Prometheus, the alert will always need at
least 1d to start working.

In the end, it's always better to have a proper authoritative source of
truth somewhere that tells the monitoring system which mountpoints are
expected (possibly, for each type of server or so), rather than relying on
past/current comparisons, but this can be a workaround.

Regards,
Julius

On Mon, Mar 1, 2021 at 2:57 PM [email protected] <
[email protected]> wrote:

>
> Hi Everyone,
>
> I have specific requirement from the client that prometheus should
> generate alert in case any mount point on the server goes missing.
>
> For Eg: If server has 3 mount points like /data1 /NFS1 /NFS2 and if by any
> reason ,/NFS2 gets delinked from the server in that case prometheus should
> generate alert.
>
> When I tried with below query,it is working fine(as this metric goes
> missing when /NFS2 got delinked from the server)
>
> absent(node_filesystem_readonly{device="XX:/NFS2",fstype="nfs2",hostname="EAST_WB_XX",instance="XX:9100",job="XX",mountpoint="/NFS2"})
> == 1
>
> However there are 800 servers which are required to get monitor therefore
> it is not possible to add 800 rules for each IP in the rules.yml.
>
> When I add below rule,it didn't generate the missing alert.
>
> absent(node_filesystem_readonly{mountpoint="/NFS2"}) == 1
>
> Please advice if we can achieve this with some tweaking in the query so
> that it can be generic for all servers.
>
> Looking forward for your response.
>
> Thanks,
> Saurabh
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/aea59e21-6a1a-406c-a033-0d8bcfdf6831n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/aea59e21-6a1a-406c-a033-0d8bcfdf6831n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Julius Volz
PromLabs - promlabs.com

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAObpH5zCnO%2B_yLW1MRuZu1AvvoYgoF%3DS5%3D9sMLDPfcFQo1ti_A%40mail.gmail.com.

Re: [prometheus-users] Mount Point Missing Alarm

Reply via email to