On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
<rgerha...@hq.adiscon.com> wrote:
> 2015-03-12 16:41 GMT+01:00 David Lang <da...@lang.hm>:
>
>> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
>>
>>  2015-03-12 5:55 GMT+01:00 singh.janmejay <singh.janme...@gmail.com>:
>>>
>>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang <da...@lang.hm> wrote:
>>>>
>>>>> On Thu, 12 Mar 2015, singh.janmejay wrote:
>>>>>
>>>>>  Tried re-ordering it? Put the one with /port first?
>>>>>>
>>>>>
>>>>>
>>>>> no, lognorm rules are not supposed to be order dependent, so I didn't
>>>>> try
>>>>> that (especially after finding things failing to parse with rsyslog that
>>>>> worked manually)
>>>>>
>>>>
>>>> In case of input strings being matching-rule-wise disjoint, you are
>>>> right, order won't matter. But when they are not disjoint, order does
>>>> matter, because the first one to match the string wins.
>>>>
>>>> Consider this rulebase:
>>>> rule=:%ip:ipv4%%last:rest%
>>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>>>>
>>>> If you write it the way I have above, you'll end up matching first
>>>> rule for input 10.20.30.40/5
>>>>
>>>> But if you write it this way:
>>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>>>> rule=:%ip:ipv4%%last:rest%
>>>>
>>>> You'll end up matching the first one.
>>>>
>>>>
>>> This shouldn't happen. The theory is:
>>>
>>> Let i be the current index to be looked at at the line. If for i a parser
>>> is selected, parsers shall be tried first (in theory, according to parser
>>> ordering, but I think this is not yet fully implemented). If a parser
>>> fits,
>>> processing is advanced to next tree node.
>>>
>>> If the node at i does not have a parser (or all parsers failed, I think
>>> [but not sure]), advance to next node basded on character match.

This is precisely what it does.

>>>
>>> The order of apperance of rules inside the rulebase should not affect
>>> this.

It doesn't for litteral-subtree, but it does for field-subtree,
because they are inserted at the tail of the linked-list.

This code (https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
adds new subtrees at the end of linked-list, which is what causes the
ordering-sensitive behaviour.

>>> If it does, it's either not yet implemented or a bug. this is also why I
>>> don't like the "rest" syntax -it always matches and thus terminates
>>> interpretation.
>>>
>>
>> I'll post a simple test case when I get into the office in a bit.
>>
>> In this particular case, it's failing to check other parsers when it hits
>> a failure and backs up.
>>
>> But there are other cases where multiple rules may match. stringto, rest,
>
>
> word, stringto are "last resort parsers", to be used only if anything else
> fails.
> rest IMHO should never be used, but I think I can propose something in the
> future that solves the need that comes with it (if there still is a need at
> that point).
>
>
>> iptables
>
>
> iptables is a different story, it's actually for a different type of logs -
> at least I think so now. I am unfortunately not prepared to discuss this
> right now, as I want to keep concentrated on the log structure analyzer. It
> doesn't help if I do a bit of everything without anything ever nearing
> completion ;)
>
>
>> are all things that can easily match a lot of data where other rules may
>> also match by having more specific listings. In such cases it should still
>> be deterministing which rule 'wins'. I can think of a few ways to define
>> this.
>>
>> 1. fewest parsers needed wins
>>
>> 2. most parsers needed wins

This is probably the closest simple approximation to best match.

I was thinking about this too.

>>
>> 3. ordering of parsers, where the 'greedier' ones are put last so they
>> only come into play if the more specific ones don't match.

We could assist it by setting relative weights etc. Eg. ipv4 gets
weight 10, but rest gets only 1 etc.

Once we get the coefficients right, this can probably be achieved(its
like a costing-based picker, run once ptree has been loaded to sort
all subtree lists by cost in one shot).

>>
>>
> That's the designed approach, and I am very sure it's the right one. As I
> said, it's at least not fully implemented.
>
> This also means we need many more specific parsers. I never get there,
> because of a) time shortage and b) lack of sufficient log samples. Where
> log samples is not a single line or two, but at least several thousands, so
> that I can evaluate false positives. While b) is still a very big problem
> to me, a) has been much relaxed thanks to the thesis work. Also, work on
> the semi-automatic rule creator looks promising. As it is a heuristic, the
> lack of log samples unfortunately is a very large hindering block.
>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to