Re: [rsyslog] Identifying/quantifying log-loss or proving no-loss in a log-store

Rainer Gerhards Tue, 16 Feb 2016 03:49:51 -0800

2016-02-16 12:27 GMT+01:00 singh.janmejay <[email protected]>:


> @Thomas: This is not about testing and quantifying loss during a test.
> Its about quantifying it during normal operation. I see it as a choice
> between:
> A. deploy the strongest protocol at every system-boundary and test
> each one rigorously and each change rigorously to identify or bound
> loss in test conditions, and expect nothing unexpected to show up in
> production
> B. do the former and measure loss in production to identify that
> something unexpected happened
> C. deploy efficient protocols at all system-boundaries and measure
> loss (as long as loss stays within an acceptable level, deployment
> benefits from all the efficiency gains)
>
> I am talking in the context of C.
>
> If/when loss is above acceptable level, one needs to debug and fix the
> problem. Both B and C provide the data required to identify
> situation(s) when such debugging needs to happen.
>
> The approach of stamping on one end an measuring on the other treats
> all intermediate hops as a blackbox. For instance, it can be used to
> quantify losses in face of frequent machine failures or down-time free
> maintenance etc.
>
> @David: As of now, I am thinking of end-of-the-day style measurement
> (basically report number of messages lost at a good-enough
> granularity, say host x severity).
>
> I am thinking of this as something independent of frequency of outages
> and unrelated to maintenance windows. Im thinking of it as a report
> that captures extent of loss, where one can pull down several months
> of this data and verify loss was never beyond a acceptable level,
> compare it across days when load profile was very different (the day
> when too many circuit-breakers engaged etc).
>
>
I just wanted to push in a link to upcoming new feature:

https://github.com/rsyslog/rsyslog/pull/764

Rainer

> I haven't thought through this, but reset may not be required.
> Basically let the counter count-up and wrap-around (as long as
> wrap-around is well defined behavior which is accounted for during
> measurement).
>
>
> On Sat, Feb 13, 2016 at 5:13 AM, David Lang <[email protected]> wrote:
> > On Sat, 13 Feb 2016, singh.janmejay wrote:
> >
> >> The ideal solution would be one that identifies host, log-source and
> >> time of loss along with accurate number of messages lost.
> >>
> >> pstats makes sense, but correlating data from stats across large
> >> number of machines will be difficult (some machines may send stats
> >> slightly delayed which may skew aggregation etc).
> >
> >
> > if you don't reset the counters, they keep increasing, so over time the
> > error due to the slew becomes a very minor componnent.
> >
> >
> >> One approach I can think of: slap a stream-identifier and
> >> sequence-number on each received message, then find gaps in sequence
> >> number for a session-id on the other side (as a query over log-store
> >> etc).
> >
> >
> > I'll point out that generating/checking a monotonic sequence number
> destroys
> > parallelism, and so it can seriously hurt performance.
> >
> > Are you trying to detect problems 'on the fly' as they happen? or at the
> end
> > of the hour/day saying 'hey, there was a problem at some point'
> >
> > how frequent do you think problems are? I would suggest that you run some
> > stress tests on your equipment/network and push things until you do have
> > problems, so you can track when they happen. I expect that you will find
> > that they don't start happening until you have much higher loads than you
> > expect (at least after a bit of tuning), and this can make it so that the
> > most invastive solutions aren't needed.
> >
> > David Lang
> >
> >
> >> Large issues such as producer suddenly going silent can be detected
> >> using macro mechanisms (like pstats).
> >>
> >> On Sat, Feb 13, 2016 at 2:56 AM, David Lang <[email protected]> wrote:
> >>>
> >>> On Sat, 13 Feb 2016, Andre wrote:
> >>>
> >>>>
> >>>> The easiest way I found to do that is to have a control system and
> send
> >>>> two
> >>>> streams of data to two or more different destinations.
> >>>>
> >>>> In case of rsyslog processing a large message volume UDP the loss has
> >>>> always been noticeable.
> >>>
> >>>
> >>>
> >>> this depends on your setup. I was able to send UDP logs at gig-E wire
> >>> speed
> >>> with no losses, but it required tuning the receiving sytem to not do
> DNS
> >>> lookups, have sufficient RAM for buffering, etc
> >>>
> >>>
> >>> I never was able to get my hands on 10G equiepment to push up from
> there.
> >>>
> >>> David Lang
> >>>
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com/professional-services/
> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> of
> >>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T
> >>> LIKE THAT.
> >>
> >>
> >>
> >>
> >>
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of
> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> > LIKE THAT.
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Identifying/quantifying log-loss or proving no-loss in a log-store

Reply via email to