2015-03-12 18:16 GMT+01:00 singh.janmejay <singh.janme...@gmail.com>:

> On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
> <rgerha...@hq.adiscon.com> wrote:
> > 2015-03-12 16:41 GMT+01:00 David Lang <da...@lang.hm>:
> >
> >> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
> >>
> >>  2015-03-12 5:55 GMT+01:00 singh.janmejay <singh.janme...@gmail.com>:
> >>>
> >>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang <da...@lang.hm> wrote:
> >>>>
> >>>>> On Thu, 12 Mar 2015, singh.janmejay wrote:
> >>>>>
> >>>>>  Tried re-ordering it? Put the one with /port first?
> >>>>>>
> >>>>>
> >>>>>
> >>>>> no, lognorm rules are not supposed to be order dependent, so I didn't
> >>>>> try
> >>>>> that (especially after finding things failing to parse with rsyslog
> that
> >>>>> worked manually)
> >>>>>
> >>>>
> >>>> In case of input strings being matching-rule-wise disjoint, you are
> >>>> right, order won't matter. But when they are not disjoint, order does
> >>>> matter, because the first one to match the string wins.
> >>>>
> >>>> Consider this rulebase:
> >>>> rule=:%ip:ipv4%%last:rest%
> >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
> >>>>
> >>>> If you write it the way I have above, you'll end up matching first
> >>>> rule for input 10.20.30.40/5
> >>>>
> >>>> But if you write it this way:
> >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
> >>>> rule=:%ip:ipv4%%last:rest%
> >>>>
> >>>> You'll end up matching the first one.
> >>>>
> >>>>
> >>> This shouldn't happen. The theory is:
> >>>
> >>> Let i be the current index to be looked at at the line. If for i a
> parser
> >>> is selected, parsers shall be tried first (in theory, according to
> parser
> >>> ordering, but I think this is not yet fully implemented). If a parser
> >>> fits,
> >>> processing is advanced to next tree node.
> >>>
> >>> If the node at i does not have a parser (or all parsers failed, I think
> >>> [but not sure]), advance to next node basded on character match.
>
> This is precisely what it does.
>
> >>>
> >>> The order of apperance of rules inside the rulebase should not affect
> >>> this.
>
> It doesn't for litteral-subtree, but it does for field-subtree,
> because they are inserted at the tail of the linked-list.
>
> This code (
> https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
> adds new subtrees at the end of linked-list, which is what causes the
> ordering-sensitive behaviour.
>
>
OK, it seems like I overlooked this effect. I don't think it is good to
have any order dependence. Anyways, the work I am carrying out will most
probably lead to algorithmic changes and I'll re-evaluate that when I reach
that point (not soon). Of course, I won't break anything that exists. If
things diverge too much, I'll add an alternate library,. But again, this
needs to be seen and it is too early to think about this,

On the ordering issue: are you sure that the order is always properly
preserved? I never put any effort into it (as order was designed
irrelevant) and some reodering (IIRC) happens intentionally (parser
priorities).

Rainer


> >>> If it does, it's either not yet implemented or a bug. this is also why
> I
> >>> don't like the "rest" syntax -it always matches and thus terminates
> >>> interpretation.
> >>>
> >>
> >> I'll post a simple test case when I get into the office in a bit.
> >>
> >> In this particular case, it's failing to check other parsers when it
> hits
> >> a failure and backs up.
> >>
> >> But there are other cases where multiple rules may match. stringto,
> rest,
> >
> >
> > word, stringto are "last resort parsers", to be used only if anything
> else
> > fails.
> > rest IMHO should never be used, but I think I can propose something in
> the
> > future that solves the need that comes with it (if there still is a need
> at
> > that point).
> >
> >
> >> iptables
> >
> >
> > iptables is a different story, it's actually for a different type of
> logs -
> > at least I think so now. I am unfortunately not prepared to discuss this
> > right now, as I want to keep concentrated on the log structure analyzer.
> It
> > doesn't help if I do a bit of everything without anything ever nearing
> > completion ;)
> >
> >
> >> are all things that can easily match a lot of data where other rules may
> >> also match by having more specific listings. In such cases it should
> still
> >> be deterministing which rule 'wins'. I can think of a few ways to define
> >> this.
> >>
> >> 1. fewest parsers needed wins
> >>
> >> 2. most parsers needed wins
>
> This is probably the closest simple approximation to best match.
>
> I was thinking about this too.
>
> >>
> >> 3. ordering of parsers, where the 'greedier' ones are put last so they
> >> only come into play if the more specific ones don't match.
>
> We could assist it by setting relative weights etc. Eg. ipv4 gets
> weight 10, but rest gets only 1 etc.
>
> Once we get the coefficients right, this can probably be achieved(its
> like a costing-based picker, run once ptree has been loaded to sort
> all subtree lists by cost in one shot).
>
> >>
> >>
> > That's the designed approach, and I am very sure it's the right one. As I
> > said, it's at least not fully implemented.
> >
> > This also means we need many more specific parsers. I never get there,
> > because of a) time shortage and b) lack of sufficient log samples. Where
> > log samples is not a single line or two, but at least several thousands,
> so
> > that I can evaluate false positives. While b) is still a very big problem
> > to me, a) has been much relaxed thanks to the thesis work. Also, work on
> > the semi-automatic rule creator looks promising. As it is a heuristic,
> the
> > lack of log samples unfortunately is a very large hindering block.
> >
> > Rainer
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to