Re: [lopsa-discuss] Metrics vs Monitoring...

Alan Robertson Mon, 13 Jan 2014 08:54:07 -0800

The open source Assimilation project discovers servers, services,
dependencies, switch connections and puts it all in a graph database.


It then automatically initiates functional monitoring of the things it
recognizes.

Although this project isn't ready for production use, it should be ready
for trial deployments in a few months (I started it in 2010).

It knows how to do functional tests on databases, web servers, etc.

For example, for databases, it does a cheap query that it knows the
answer to.  For web servers, it looks for a 200 return code from the web
page and content that matches a regex (or more sophisticated things).

It is not based on, or remotely related to Nagios.  Its method of
monitoring is much closer to that of HA systems like Pacemaker.

The project web site is at http://assimproj.org/

There is a video or two giving a technical overview there.

I gave 3 talks on it at LCA (linux.conf.au) last week - and all got very
enthusiastic receptions.  There are videos of those talks as well, but I
haven't added the pointers to them to the web site.

It does not yet collect time-series data - so it currently only does
exception monitoring.


On 01/13/2014 09:30 AM, Nathan Hruby wrote:
> Hi,
>
> I'm kicking over similar thoughts now.
>
> Metrics and log stream analysis I think go a long way, but actual
> monitoring of a system in total still needs to be done.  I think right
> now, for me, monitoring is moving torward an automated functional test
> (hit endpoint, query data, get some expected result, etc..) and less
> "is webserver up."  But, some critical components in the pipeline will
> still need dedicated monitoring to make fault isolation easier and
> solve questions like "if the query failed is that because the message
> bus is busted, the backend service is funked up, the database is
> batty, or my data is just plain fried?"  Eg, alert on  functionality
> failures but have the system give me the proper red lights on systems
> to point me in the right direction at response time.
>
> Also, handling dependacies for those events + metrics alerts is still
> something I'm trying to figure out in a way that avoids alert fatigue.
>  Most (all?) of the metics type packages don’t have any sort of notion
> of "if the core switch is unpingable, don't alert me about the 57 web
> server behind it."  Flapjack might help with this, but it's totally
> overkill for a lot of instances.
>
> As far as I can tell, the ideal solution (for me) seems to be to use
> nagios/sensu/whatever to aggregate functional tests, standard
> healthcheck monitoring, and metric threshold analysis into a
> comprehensive "view" of system state, and then alert on that.
>
> NOTE: I don't claim the is right or sane, interested in the ensuing 
> discussion.
>
> Thanks!
>
> -n
>
> On Mon, Jan 13, 2014 at 9:04 AM, Matthew Barr <[email protected]> wrote:
>> So, i’ve recently been reading up on the #monitoringsucks tags, their 
>> responses, and some of the various things that have come out of it.
>> I’m in a new shop, AWS based, so may of the old standbys aren’t quite as 
>> much of a obvious call anymore.
>>
>> What I’m now trying to figure out is what I’m missing, or would lose, by 
>> going with a newer paradigm for monitoring.
>>
>>
>> Anyone using Riemann yet?   Do you still use nagios / sensu / etc?
>>
>>  — Basically, Riemann operates on a stream of metrics, vs relying on a a 
>> check every X min.
>>
>> I’m trying to determine what I’ve lost by not implementing a nagios style 
>> system, to basically cron checks.   (the alerting & state stuff I’m pretty 
>> confidant I’m not loosing.)
>>
>>
>> For example: I had initially thought I’d lose a check of the web site every 
>> X min, but the load balancer does that anyways, and that triggers log and 
>> metrics about page speed return.
>>
>> I think that as you scale, you start getting even more data & metrics, and 
>> the need for manual injection of jobs becomes smaller.
>>
>>
>> I’m curious about peoples thoughts on this…
>>
>>
>> Matthew
>> [email protected]
>>
>> _______________________________________________
>> Discuss mailing list
>> [email protected]
>> https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
>> This list provided by the League of Professional System Administrators
>>  http://lopsa.org/
>
>


-- 
    Alan Robertson <[email protected]> - @OSSAlanR

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Metrics vs Monitoring...

Reply via email to