Re: [rsyslog] mmnormalize thoughts

singh.janmejay Thu, 12 Mar 2015 10:46:08 -0700

I haven't seen the reordering code yet, but the loading does preserve order.


It still is deterministic, just that the criteria is rule-order (and
it being applicable only for field-subtrees makes it slightly odd).

On Thu, Mar 12, 2015 at 10:55 PM, Rainer Gerhards
<rgerha...@hq.adiscon.com> wrote:
> 2015-03-12 18:16 GMT+01:00 singh.janmejay <singh.janme...@gmail.com>:
>
>> On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
>> <rgerha...@hq.adiscon.com> wrote:
>> > 2015-03-12 16:41 GMT+01:00 David Lang <da...@lang.hm>:
>> >
>> >> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
>> >>
>> >>  2015-03-12 5:55 GMT+01:00 singh.janmejay <singh.janme...@gmail.com>:
>> >>>
>> >>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang <da...@lang.hm> wrote:
>> >>>>
>> >>>>> On Thu, 12 Mar 2015, singh.janmejay wrote:
>> >>>>>
>> >>>>>  Tried re-ordering it? Put the one with /port first?
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> no, lognorm rules are not supposed to be order dependent, so I didn't
>> >>>>> try
>> >>>>> that (especially after finding things failing to parse with rsyslog
>> that
>> >>>>> worked manually)
>> >>>>>
>> >>>>
>> >>>> In case of input strings being matching-rule-wise disjoint, you are
>> >>>> right, order won't matter. But when they are not disjoint, order does
>> >>>> matter, because the first one to match the string wins.
>> >>>>
>> >>>> Consider this rulebase:
>> >>>> rule=:%ip:ipv4%%last:rest%
>> >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>> >>>>
>> >>>> If you write it the way I have above, you'll end up matching first
>> >>>> rule for input 10.20.30.40/5
>> >>>>
>> >>>> But if you write it this way:
>> >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>> >>>> rule=:%ip:ipv4%%last:rest%
>> >>>>
>> >>>> You'll end up matching the first one.
>> >>>>
>> >>>>
>> >>> This shouldn't happen. The theory is:
>> >>>
>> >>> Let i be the current index to be looked at at the line. If for i a
>> parser
>> >>> is selected, parsers shall be tried first (in theory, according to
>> parser
>> >>> ordering, but I think this is not yet fully implemented). If a parser
>> >>> fits,
>> >>> processing is advanced to next tree node.
>> >>>
>> >>> If the node at i does not have a parser (or all parsers failed, I think
>> >>> [but not sure]), advance to next node basded on character match.
>>
>> This is precisely what it does.
>>
>> >>>
>> >>> The order of apperance of rules inside the rulebase should not affect
>> >>> this.
>>
>> It doesn't for litteral-subtree, but it does for field-subtree,
>> because they are inserted at the tail of the linked-list.
>>
>> This code (
>> https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
>> adds new subtrees at the end of linked-list, which is what causes the
>> ordering-sensitive behaviour.
>>
>>
> OK, it seems like I overlooked this effect. I don't think it is good to
> have any order dependence. Anyways, the work I am carrying out will most
> probably lead to algorithmic changes and I'll re-evaluate that when I reach
> that point (not soon). Of course, I won't break anything that exists. If
> things diverge too much, I'll add an alternate library,. But again, this
> needs to be seen and it is too early to think about this,
>
> On the ordering issue: are you sure that the order is always properly
> preserved? I never put any effort into it (as order was designed
> irrelevant) and some reodering (IIRC) happens intentionally (parser
> priorities).
>
> Rainer
>
>
>> >>> If it does, it's either not yet implemented or a bug. this is also why
>> I
>> >>> don't like the "rest" syntax -it always matches and thus terminates
>> >>> interpretation.
>> >>>
>> >>
>> >> I'll post a simple test case when I get into the office in a bit.
>> >>
>> >> In this particular case, it's failing to check other parsers when it
>> hits
>> >> a failure and backs up.
>> >>
>> >> But there are other cases where multiple rules may match. stringto,
>> rest,
>> >
>> >
>> > word, stringto are "last resort parsers", to be used only if anything
>> else
>> > fails.
>> > rest IMHO should never be used, but I think I can propose something in
>> the
>> > future that solves the need that comes with it (if there still is a need
>> at
>> > that point).
>> >
>> >
>> >> iptables
>> >
>> >
>> > iptables is a different story, it's actually for a different type of
>> logs -
>> > at least I think so now. I am unfortunately not prepared to discuss this
>> > right now, as I want to keep concentrated on the log structure analyzer.
>> It
>> > doesn't help if I do a bit of everything without anything ever nearing
>> > completion ;)
>> >
>> >
>> >> are all things that can easily match a lot of data where other rules may
>> >> also match by having more specific listings. In such cases it should
>> still
>> >> be deterministing which rule 'wins'. I can think of a few ways to define
>> >> this.
>> >>
>> >> 1. fewest parsers needed wins
>> >>
>> >> 2. most parsers needed wins
>>
>> This is probably the closest simple approximation to best match.
>>
>> I was thinking about this too.
>>
>> >>
>> >> 3. ordering of parsers, where the 'greedier' ones are put last so they
>> >> only come into play if the more specific ones don't match.
>>
>> We could assist it by setting relative weights etc. Eg. ipv4 gets
>> weight 10, but rest gets only 1 etc.
>>
>> Once we get the coefficients right, this can probably be achieved(its
>> like a costing-based picker, run once ptree has been loaded to sort
>> all subtree lists by cost in one shot).
>>
>> >>
>> >>
>> > That's the designed approach, and I am very sure it's the right one. As I
>> > said, it's at least not fully implemented.
>> >
>> > This also means we need many more specific parsers. I never get there,
>> > because of a) time shortage and b) lack of sufficient log samples. Where
>> > log samples is not a single line or two, but at least several thousands,
>> so
>> > that I can evaluate false positives. While b) is still a very big problem
>> > to me, a) has been much relaxed thanks to the thesis work. Also, work on
>> > the semi-automatic rule creator looks promising. As it is a heuristic,
>> the
>> > lack of log samples unfortunately is a very large hindering block.
>> >
>> > Rainer
>> > _______________________________________________
>> > rsyslog mailing list
>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > http://www.rsyslog.com/professional-services/
>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>>
>>
>> --
>> Regards,
>> Janmejay
>> http://codehunk.wordpress.com
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] mmnormalize thoughts

Reply via email to