8] Email Alerts

Scott McKeown Thu, 12 Feb 2015 02:12:12 -0800

Hi Guys,

Just finished testing the functionality of the eMail Alerter update and it
looks to work perfectly if the Real Server is removed by the way of a
failure as in it fails its health check:


Server l7-front1/jukebox is DOWN, reason: Layer4 timeout, check duration:
6001ms. 1 active and 1 backup servers left. 0 sessions active, 0 requeued,
0 remaining in queue

With the exception of setting the email-alert level to info this fails for
what looks like a misformed SMTP handshake (well on my SMTP server anyway).

That said I think it would be a good idea if we could expand on this
further and have an option to set when we want to be notified. For example
I didn't get an eMail when the Real Server was back online and added to the
Backend even though HAProxy logs showed that it was and it was allowing
connections. Also Drain and Halt don't look to get an eMail so my thought
is add a new option which could be optional that allows you to set when to
me informed. Something like:

email-alert notices up,down,drain,halt    -  this would eMail on all options

email-alert notices up,down                    - this would only eMail on
Up or Down states

What do you thing?


~Yours,
Scott


On 5 February 2015 at 20:13, Pavlos Parissis <pavlos.paris...@gmail.com>
wrote:

> On 04/02/2015 01:26 πμ, Simon Horman wrote:
> > On Tue, Feb 03, 2015 at 05:13:02PM +0100, Baptiste wrote:
> >> On Tue, Feb 3, 2015 at 4:59 PM, Pavlos Parissis
> >> <pavlos.paris...@gmail.com> wrote:
> >>> On 01/02/2015 03:15 μμ, Willy Tarreau wrote:
> >>>> Hi Simon,
> >>>>
> >>>> On Fri, Jan 30, 2015 at 11:22:52AM +0900, Simon Horman wrote:
> >>>>> Hi Willy, Hi All,
> >>>>>
> >>>>> the purpose of this email is to solicit feedback on an implementation
> >>>>> of email alerts for haproxy the design of which is based on a
> discussion
> >>>>> in this forum some months ago.
> >>>
> >>>
> >>> It would be great if we could use something like this
> >>> acl low_capacity nbsrv(foo_backend) lt 2
> >>> mail alert if low_capacity
> >>>
> >>> In some environments you only care to wake up the on-call sysadmin if
> you are
> >>> real troubles and not because 1-2 servers failed.
> >>>
> >>> Nice work,
> >>> Pavlos
> >>>
> >>
> >>
> >>
> >> This might be doable using monitor-uri and monitor fail directives in
> >> a dedicated listen section which would fail if number of server in a
> >> monitored farm goes below a threshold.
> >>
> >> That said, this is a dirty hack.
> >
> > A agree entirely that there is a lot to be said for providing a facility
> > for alert suppression and escalation. To my mind the current
> implementation,
> > which internally works with a queue, lends itself to these kinds of
> > extensions. The key question in mind is how to design advanced such
> > as the one you have suggested in such a way that they can be useful in a
> > wide range of use-cases.
> >
> > So far there seem to be three semi-related ideas circulating
> > on this list. I have added a fourth:
> >
> > 1. Suppressing alerts based on priority.
> >    e.g. Only send alerts for events whose priority is > x.
> >
> > 2. Combining alerts into a single message.
> >    e.g. If n alerts are queued up to be sent within time t
> >         then send them in one message rather than n.
> >
> > 3. Escalate alerts
> >    e.g. Only send alerts of priority x if more than n have occurred
> within
> >         time t.
> >    This seems to be a combination of 1 and 2.
> >    This may or not involve raising the priority of the resulting combined
> >    alert (internally or otherwise)
> >
> >    An extra qualification may be that the events need to relate to
> something
> >    common:
> >    e.g. servers of the same proxy
> >         Loosing one may not be bad, loosing all of them I may wish
> >       to get out of bed for
> >
> > 4. Suppressing transient alerts
> >    e.g. I may not care if server s goes down then comes back up again
> >         within time t.
> >    But I may if it keeps happening. This part seems like a variant of 3.
> >
> >
> > I expect we can grow this list of use-cases. I also think things
> > may become quite complex quite quickly. But it would be nice to implement
> > something not overly convoluted yet useful.
> >
>
>
> What you have done so far provides the basic 'monitoring' alert
> functionality and it is the first step to something than can become
> bigger, better but complex as you say.
>
> The functionality you have listed, it is covered by several monitor
> systems, either dummy like nagios or 'smart' which apply real-time
> anomaly detection(skyline, etc) by either actively probing services or
> passively receiving events.
>
> HAProxy it is another service inside a data center which produces
> events, servers go down/up, dip/spike on traffic and etc.
>
> In small companies which can't afford to have a centralized monitor
> system and prefer to just receive various e-mail from ~10 systems,
> having some monitor intelligence (aggregation, alerts based on
> thresholds) build-in is perfect and very much appreciated.
>
> But, in large installation where you have 10K servers and 400 services,
> you want to receive raw events without any aggregation and the 'smart'
> monitor system will figure out what to do before it wakes up the on-call
> sysadmin(I am on of them).
>
> To sum up, the current data exposed over stats socket satisfies the need
> of the large installation, I know that because I am quite happy with
> amount of data HAProxy exposes and I work in environment where we
> utilize these 'smart' monitor systems.
>
> At my friend's start-up company which has 8 services and I don't want to
> develop scripts/tools to pull info from stats socket, just mail me and I
> will alter my self based on the amount of e-mails I receive, and if
> HAProxy can do some kind of aggregation/threshold then my mailbox will
> thank HAProxy a lot.
>
> I hope it helps and once again thanks for your hard work,
> Pavlos
>
>
>
>
>
>
>


-- 
With Kind Regards.

Scott McKeown
Loadbalancer.org
http://www.loadbalancer.org
Tel (UK) - +44 (0) 3303801064 (24x7)
Tel (US) - +1 888.867.9504 (Toll Free)(24x7)

Re: [PATCH/RFC 0/8] Email Alerts

Reply via email to