[rsyslog] mmnormalize thoughts

2015-02-03 Thread David Lang
as I'm spending a bunch of time making templates from cisco logs, a few thoughts 
on mmnormalize


1. It should probably set parsesuccess like mmjsonparse does

2. it would be useful to have something like char-to that accepted multiple 
characters as the termination pattern. thanks to the addition of toeknize I was 
able to work around this ('flags FIN ACK  on interface' where the number of 
flags listed is variable)


3. the number type should accept negative numbers, not just digits


4. it would be fantastic to be able to define custom types in the config

example

inside:1.2.3.4/56 is a pattern that happens a lot and I use 
%srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and 
%dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this pattern


, being able to define

custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%

and then use "%src:info% to %dst:info% instead of that full pattern and have the 
resulting json be

{ src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...


5. Going back to the 'or' question. It would be even better to be able to define 
this custom type as a set of patterns.


while inside:1.2.3.4/56 is a common endpoint definition there are also 
1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4


if you could define the custom type to be a list of patterns this would let you 
take advantage of the two-dimentional nature of JSON and simplify the ruleset 
considerably.


It would also give you a good way to handle the 'or' for Apache logs for example 
defining one of the options as a constant '-'


defining an 'or' instead each pattern is a horrible mess to try and understand, 
but if it's done by implementing a new type, I don't have a problem with it.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-02-04 Thread singh.janmejay
On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:

> as I'm spending a bunch of time making templates from cisco logs, a few
> thoughts on mmnormalize
>
> 1. It should probably set parsesuccess like mmjsonparse does
>

This will be very useful.


>
> 2. it would be useful to have something like char-to that accepted
> multiple characters as the termination pattern. thanks to the addition of
> toeknize I was able to work around this ('flags FIN ACK  on interface'
> where the number of flags listed is variable)
>

I felt the need for this too. I believe the recent string-to thing does
this?


>
> 3. the number type should accept negative numbers, not just digits
>
>
> 4. it would be fantastic to be able to define custom types in the config
>
> example
>
> inside:1.2.3.4/56 is a pattern that happens a lot and I use
> %srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and
> %dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this
> pattern
>
> , being able to define
>
> custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
>
> and then use "%src:info% to %dst:info% instead of that full pattern and
> have the resulting json be
> { src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...
>
>

Field type 'descent' does this, but not exactly in the same way.


>
> 5. Going back to the 'or' question. It would be even better to be able to
> define this custom type as a set of patterns.
>
> while inside:1.2.3.4/56 is a common endpoint definition there are also
> 1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4
>
> if you could define the custom type to be a list of patterns this would
> let you take advantage of the two-dimentional nature of JSON and simplify
> the ruleset considerably.
>
> It would also give you a good way to handle the 'or' for Apache logs for
> example defining one of the options as a constant '-'
>
> defining an 'or' instead each pattern is a horrible mess to try and
> understand, but if it's done by implementing a new type, I don't have a
> problem with it.
>
> David Lang
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-02-04 Thread David Lang

On Wed, 4 Feb 2015, singh.janmejay wrote:


On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:


as I'm spending a bunch of time making templates from cisco logs, a few
thoughts on mmnormalize

1. It should probably set parsesuccess like mmjsonparse does



This will be very useful.




2. it would be useful to have something like char-to that accepted
multiple characters as the termination pattern. thanks to the addition of
toeknize I was able to work around this ('flags FIN ACK  on interface'
where the number of flags listed is variable)



I felt the need for this too. I believe the recent string-to thing does
this?


I missed that. One thing that is wrong with liblognorm and mmnormalize is that 
the docs that are pointed to are horribly out of date and don't mention a lot of 
these capabilities. I cloned the source from github and was looking through it 
to find things, but apparently missed this one.






3. the number type should accept negative numbers, not just digits


4. it would be fantastic to be able to define custom types in the config

example

inside:1.2.3.4/56 is a pattern that happens a lot and I use
%srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and
%dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this
pattern

, being able to define

custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%

and then use "%src:info% to %dst:info% instead of that full pattern and
have the resulting json be
{ src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...




Field type 'descent' does this, but not exactly in the same way.


does it? I understood it to just be calling another ruleset on the whole line 
(doc problem again)


David Lang





5. Going back to the 'or' question. It would be even better to be able to
define this custom type as a set of patterns.

while inside:1.2.3.4/56 is a common endpoint definition there are also
1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4

if you could define the custom type to be a list of patterns this would
let you take advantage of the two-dimentional nature of JSON and simplify
the ruleset considerably.

It would also give you a good way to handle the 'or' for Apache logs for
example defining one of the options as a constant '-'

defining an 'or' instead each pattern is a horrible mess to try and
understand, but if it's done by implementing a new type, I don't have a
problem with it.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.







___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-02-04 Thread singh.janmejay
On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:

> On Wed, 4 Feb 2015, singh.janmejay wrote:
>
>  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:
>>
>>  as I'm spending a bunch of time making templates from cisco logs, a few
>>> thoughts on mmnormalize
>>>
>>> 1. It should probably set parsesuccess like mmjsonparse does
>>>
>>>
>> This will be very useful.
>>
>>
>>
>>> 2. it would be useful to have something like char-to that accepted
>>> multiple characters as the termination pattern. thanks to the addition of
>>> toeknize I was able to work around this ('flags FIN ACK  on interface'
>>> where the number of flags listed is variable)
>>>
>>>
>> I felt the need for this too. I believe the recent string-to thing does
>> this?
>>
>
> I missed that. One thing that is wrong with liblognorm and mmnormalize is
> that the docs that are pointed to are horribly out of date and don't
> mention a lot of these capabilities. I cloned the source from github and
> was looking through it to find things, but apparently missed this one.
>
>
This one: https://github.com/rsyslog/liblognorm/pull/20/files


>
>>
>>> 3. the number type should accept negative numbers, not just digits
>>>
>>>
>>> 4. it would be fantastic to be able to define custom types in the config
>>>
>>> example
>>>
>>> inside:1.2.3.4/56 is a pattern that happens a lot and I use
>>> %srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and
>>> %dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this
>>> pattern
>>>
>>> , being able to define
>>>
>>> custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
>>>
>>> and then use "%src:info% to %dst:info% instead of that full pattern and
>>> have the resulting json be
>>> { src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...
>>>
>>>
>>>
>> Field type 'descent' does this, but not exactly in the same way.
>>
>
> does it? I understood it to just be calling another ruleset on the whole
> line (doc problem again)
>

It allows field to identify how remaining-text should be returned, which
allows it to be parsed by remaining part of the rule which the field
belongs to.

Here is a test which uses something similar to what you are trying to do:
https://github.com/rsyslog/liblognorm/blob/master/tests/field_tokenized_recursive.sh#L41

(check 41 to EOF)

>
> David Lang
>
>
>
>>
>>> 5. Going back to the 'or' question. It would be even better to be able to
>>> define this custom type as a set of patterns.
>>>
>>> while inside:1.2.3.4/56 is a common endpoint definition there are also
>>> 1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4
>>>
>>> if you could define the custom type to be a list of patterns this would
>>> let you take advantage of the two-dimentional nature of JSON and simplify
>>> the ruleset considerably.
>>>
>>> It would also give you a good way to handle the 'or' for Apache logs for
>>> example defining one of the options as a constant '-'
>>>
>>> defining an 'or' instead each pattern is a horrible mess to try and
>>> understand, but if it's done by implementing a new type, I don't have a
>>> problem with it.
>>>
>>> David Lang
>>> ___
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>>
>>
>>
>>
>>  ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-02-05 Thread David Lang

On Wed, 4 Feb 2015, singh.janmejay wrote:


On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:


On Wed, 4 Feb 2015, singh.janmejay wrote:

 On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:




Field type 'descent' does this, but not exactly in the same way.



does it? I understood it to just be calling another ruleset on the whole
line (doc problem again)



It allows field to identify how remaining-text should be returned, which
allows it to be parsed by remaining part of the rule which the field
belongs to.

Here is a test which uses something similar to what you are trying to do:
https://github.com/rsyslog/liblognorm/blob/master/tests/field_tokenized_recursive.sh#L41

(check 41 to EOF)


This looks like it may do this, but it looks like it's not in the release yet. 
I'll have to compile from scratch.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-02-05 Thread singh.janmejay
It's going to be in the coming release, just master build for now.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Feb 6, 2015 6:37 AM, "David Lang"  wrote:

> On Wed, 4 Feb 2015, singh.janmejay wrote:
>
>  On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:
>>
>>  On Wed, 4 Feb 2015, singh.janmejay wrote:
>>>
>>>  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:
>>>

>
>  Field type 'descent' does this, but not exactly in the same way.


>>> does it? I understood it to just be calling another ruleset on the whole
>>> line (doc problem again)
>>>
>>>
>> It allows field to identify how remaining-text should be returned, which
>> allows it to be parsed by remaining part of the rule which the field
>> belongs to.
>>
>> Here is a test which uses something similar to what you are trying to do:
>> https://github.com/rsyslog/liblognorm/blob/master/tests/
>> field_tokenized_recursive.sh#L41
>>
>> (check 41 to EOF)
>>
>
> This looks like it may do this, but it looks like it's not in the release
> yet. I'll have to compile from scratch.
>
> David Lang
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-02-06 Thread David Lang
While I'm working to build packages of this to test with, what happens if you 
descend into a ruleset like the following


rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%

will it work to find the match that has the least left in last?

David Lang


On Fri, 6 Feb 2015, singh.janmejay wrote:


It's going to be in the coming release, just master build for now.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Feb 6, 2015 6:37 AM, "David Lang"  wrote:


On Wed, 4 Feb 2015, singh.janmejay wrote:

 On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:


 On Wed, 4 Feb 2015, singh.janmejay wrote:


 On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:





 Field type 'descent' does this, but not exactly in the same way.




does it? I understood it to just be calling another ruleset on the whole
line (doc problem again)



It allows field to identify how remaining-text should be returned, which
allows it to be parsed by remaining part of the rule which the field
belongs to.

Here is a test which uses something similar to what you are trying to do:
https://github.com/rsyslog/liblognorm/blob/master/tests/
field_tokenized_recursive.sh#L41

(check 41 to EOF)



This looks like it may do this, but it looks like it's not in the release
yet. I'll have to compile from scratch.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-11 Thread singh.janmejay
Tried re-ordering it? Put the one with /port first?

Yes, rest must get atleast one char to succeed. I'll create some new
tests without rest-capture (and see what fails).

On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:
> I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a chance to
> compile it myself and test it earlier)
>
> I ran into two problems
>
> first, %last:rest% does not match if there is nothing left on the line
>
> i.e. a line that ends with an IP address will not match
> rule=:%ip:ipv4%%last:rest%
>
> secondly, liblognorm is selecting the rule that matches the least amount of
> the message.
>
> so with these two rules
>
> rule=:%ip:ipv4%%last:rest%
> rule=:%ip:ipv4%/%port:number%%last:rest%
>
> 192.168.1.1/5 will get matched by the first rule, with '/5' in last, even
> though the second rule would match it. If I remove the first rule, the
> second rule does match and the parse succeeds.
>
> David Lang
>
>
> On Fri, 6 Feb 2015, David Lang wrote:
>
>> While I'm working to build packages of this to test with, what happens if
>> you descend into a ruleset like the following
>>
>> rule=:%ip:ipv4%%last:rest%
>> rule=:%ip:ipv4%/%port:number%%last:rest%
>>
>> will it work to find the match that has the least left in last?
>>
>> David Lang
>>
>>
>> On Fri, 6 Feb 2015, singh.janmejay wrote:
>>
>>> It's going to be in the coming release, just master build for now.
>>>
>>> --
>>> Regards,
>>> Janmejay
>>>
>>> PS: Please blame the typos in this mail on my phone's uncivilized soft
>>> keyboard sporting it's not-so-smart-assist technology.
>>>
>>> On Feb 6, 2015 6:37 AM, "David Lang"  wrote:
>>>
 On Wed, 4 Feb 2015, singh.janmejay wrote:

  On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:
>
>
>  On Wed, 4 Feb 2015, singh.janmejay wrote:
>>
>>
>>  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:
>>
>>>

  Field type 'descent' does this, but not exactly in the same way.
>>>
>>>
>>>
>> does it? I understood it to just be calling another ruleset on the
>> whole
>> line (doc problem again)
>>
>>
> It allows field to identify how remaining-text should be returned,
> which
> allows it to be parsed by remaining part of the rule which the field
> belongs to.
>
> Here is a test which uses something similar to what you are trying to
> do:
> https://github.com/rsyslog/liblognorm/blob/master/tests/
> field_tokenized_recursive.sh#L41
>
> (check 41 to EOF)
>

 This looks like it may do this, but it looks like it's not in the
 release
 yet. I'll have to compile from scratch.

 David Lang
 ___
 rsyslog mailing list
 http://lists.adiscon.net/mailman/listinfo/rsyslog
 http://www.rsyslog.com/professional-services/
 What's up with rsyslog? Follow https://twitter.com/rgerhards
 NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
 of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
 DON'T LIKE THAT.

>>> ___
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>>> LIKE THAT.
>>>
>> ___
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>>
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-11 Thread David Lang

On Thu, 12 Mar 2015, singh.janmejay wrote:


Tried re-ordering it? Put the one with /port first?


no, lognorm rules are not supposed to be order dependent, so I didn't try that 
(especially after finding things failing to parse with rsyslog that worked 
manually)



Yes, rest must get atleast one char to succeed. I'll create some new
tests without rest-capture (and see what fails).


Ok, this can be worked around (but it's a bit ugly), any reason why rest has to 
get at least one character?


David Lang


On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:

I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a chance to
compile it myself and test it earlier)

I ran into two problems

first, %last:rest% does not match if there is nothing left on the line

i.e. a line that ends with an IP address will not match
rule=:%ip:ipv4%%last:rest%

secondly, liblognorm is selecting the rule that matches the least amount of
the message.

so with these two rules

rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%

192.168.1.1/5 will get matched by the first rule, with '/5' in last, even
though the second rule would match it. If I remove the first rule, the
second rule does match and the parse succeeds.

David Lang


On Fri, 6 Feb 2015, David Lang wrote:


While I'm working to build packages of this to test with, what happens if
you descend into a ruleset like the following

rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%

will it work to find the match that has the least left in last?

David Lang


On Fri, 6 Feb 2015, singh.janmejay wrote:


It's going to be in the coming release, just master build for now.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Feb 6, 2015 6:37 AM, "David Lang"  wrote:


On Wed, 4 Feb 2015, singh.janmejay wrote:

 On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:



 On Wed, 4 Feb 2015, singh.janmejay wrote:



 On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:





 Field type 'descent' does this, but not exactly in the same way.





does it? I understood it to just be calling another ruleset on the
whole
line (doc problem again)



It allows field to identify how remaining-text should be returned,
which
allows it to be parsed by remaining part of the rule which the field
belongs to.

Here is a test which uses something similar to what you are trying to
do:
https://github.com/rsyslog/liblognorm/blob/master/tests/
field_tokenized_recursive.sh#L41

(check 41 to EOF)



This looks like it may do this, but it looks like it's not in the
release
yet. I'll have to compile from scratch.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.






___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-11 Thread singh.janmejay
On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
> On Thu, 12 Mar 2015, singh.janmejay wrote:
>
>> Tried re-ordering it? Put the one with /port first?
>
>
> no, lognorm rules are not supposed to be order dependent, so I didn't try
> that (especially after finding things failing to parse with rsyslog that
> worked manually)

In case of input strings being matching-rule-wise disjoint, you are
right, order won't matter. But when they are not disjoint, order does
matter, because the first one to match the string wins.

Consider this rulebase:
rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

If you write it the way I have above, you'll end up matching first
rule for input 10.20.30.40/5

But if you write it this way:
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
rule=:%ip:ipv4%%last:rest%

You'll end up matching the first one.

I know it appears order independent for your original rulebase, but
that is because fields are always tried first(in preference to
subtrees hanging off literals), and rest is a field, while '/' creates
a  litteral-subtree.

>
>> Yes, rest must get atleast one char to succeed. I'll create some new
>> tests without rest-capture (and see what fails).
>
>
> Ok, this can be worked around (but it's a bit ugly), any reason why rest has
> to get at least one character?

Yep, its annoying, it happens only for last token.

The reason is, parsed-fragment length >= input-string is used as a
termination condition for ln_normalize recursion (see ln_normalizeRec)
and the last token identified when recursion terminates is not the
terminal-node, so its not considered a complete match(one that goes
till leaf of ptree).

>
> David Lang
>
>
>> On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:
>>>
>>> I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a chance
>>> to
>>> compile it myself and test it earlier)
>>>
>>> I ran into two problems
>>>
>>> first, %last:rest% does not match if there is nothing left on the line
>>>
>>> i.e. a line that ends with an IP address will not match
>>> rule=:%ip:ipv4%%last:rest%
>>>
>>> secondly, liblognorm is selecting the rule that matches the least amount
>>> of
>>> the message.
>>>
>>> so with these two rules
>>>
>>> rule=:%ip:ipv4%%last:rest%
>>> rule=:%ip:ipv4%/%port:number%%last:rest%

I guess the hack I proposed above (using char-sep) can unblock you for
now, unless you hate its aesthetics too much :-).

>>>
>>> 192.168.1.1/5 will get matched by the first rule, with '/5' in last, even
>>> though the second rule would match it. If I remove the first rule, the
>>> second rule does match and the parse succeeds.
>>>
>>> David Lang
>>>
>>>
>>> On Fri, 6 Feb 2015, David Lang wrote:
>>>
 While I'm working to build packages of this to test with, what happens
 if
 you descend into a ruleset like the following

 rule=:%ip:ipv4%%last:rest%
 rule=:%ip:ipv4%/%port:number%%last:rest%

 will it work to find the match that has the least left in last?

 David Lang


 On Fri, 6 Feb 2015, singh.janmejay wrote:

> It's going to be in the coming release, just master build for now.
>
> --
> Regards,
> Janmejay
>
> PS: Please blame the typos in this mail on my phone's uncivilized soft
> keyboard sporting it's not-so-smart-assist technology.
>
> On Feb 6, 2015 6:37 AM, "David Lang"  wrote:
>
>> On Wed, 4 Feb 2015, singh.janmejay wrote:
>>
>>  On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:
>>>
>>>
>>>
>>>  On Wed, 4 Feb 2015, singh.janmejay wrote:



  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:

>
>>
>>  Field type 'descent' does this, but not exactly in the same way.
>
>
>
>
 does it? I understood it to just be calling another ruleset on the
 whole
 line (doc problem again)


>>> It allows field to identify how remaining-text should be returned,
>>> which
>>> allows it to be parsed by remaining part of the rule which the field
>>> belongs to.
>>>
>>> Here is a test which uses something similar to what you are trying to
>>> do:
>>> https://github.com/rsyslog/liblognorm/blob/master/tests/
>>> field_tokenized_recursive.sh#L41
>>>
>>> (check 41 to EOF)
>>>
>>
>> This looks like it may do this, but it looks like it's not in the
>> release
>> yet. I'll have to compile from scratch.
>>
>> David Lang
>> ___
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if 

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread David Lang

On Thu, 12 Mar 2015, singh.janmejay wrote:


On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:

On Thu, 12 Mar 2015, singh.janmejay wrote:


Tried re-ordering it? Put the one with /port first?



no, lognorm rules are not supposed to be order dependent, so I didn't try
that (especially after finding things failing to parse with rsyslog that
worked manually)


In case of input strings being matching-rule-wise disjoint, you are
right, order won't matter. But when they are not disjoint, order does
matter, because the first one to match the string wins.

Consider this rulebase:
rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

If you write it the way I have above, you'll end up matching first
rule for input 10.20.30.40/5


but when it can't find a match for / and has to undo the match and go back up 
the tree, why doesn't it try the next possible match? (repeating as needed until 
it has tried all possible branches of the tree)


David Lang


But if you write it this way:
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
rule=:%ip:ipv4%%last:rest%

You'll end up matching the first one.

I know it appears order independent for your original rulebase, but
that is because fields are always tried first(in preference to
subtrees hanging off literals), and rest is a field, while '/' creates
a  litteral-subtree.




Yes, rest must get atleast one char to succeed. I'll create some new
tests without rest-capture (and see what fails).



Ok, this can be worked around (but it's a bit ugly), any reason why rest has
to get at least one character?


Yep, its annoying, it happens only for last token.

The reason is, parsed-fragment length >= input-string is used as a
termination condition for ln_normalize recursion (see ln_normalizeRec)
and the last token identified when recursion terminates is not the
terminal-node, so its not considered a complete match(one that goes
till leaf of ptree).



David Lang



On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:


I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a chance
to
compile it myself and test it earlier)

I ran into two problems

first, %last:rest% does not match if there is nothing left on the line

i.e. a line that ends with an IP address will not match
rule=:%ip:ipv4%%last:rest%

secondly, liblognorm is selecting the rule that matches the least amount
of
the message.

so with these two rules

rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%


I guess the hack I proposed above (using char-sep) can unblock you for
now, unless you hate its aesthetics too much :-).



192.168.1.1/5 will get matched by the first rule, with '/5' in last, even
though the second rule would match it. If I remove the first rule, the
second rule does match and the parse succeeds.

David Lang


On Fri, 6 Feb 2015, David Lang wrote:


While I'm working to build packages of this to test with, what happens
if
you descend into a ruleset like the following

rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%

will it work to find the match that has the least left in last?

David Lang


On Fri, 6 Feb 2015, singh.janmejay wrote:


It's going to be in the coming release, just master build for now.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Feb 6, 2015 6:37 AM, "David Lang"  wrote:


On Wed, 4 Feb 2015, singh.janmejay wrote:

 On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:




 On Wed, 4 Feb 2015, singh.janmejay wrote:




 On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:





 Field type 'descent' does this, but not exactly in the same way.






does it? I understood it to just be calling another ruleset on the
whole
line (doc problem again)



It allows field to identify how remaining-text should be returned,
which
allows it to be parsed by remaining part of the rule which the field
belongs to.

Here is a test which uses something similar to what you are trying to
do:
https://github.com/rsyslog/liblognorm/blob/master/tests/
field_tokenized_recursive.sh#L41

(check 41 to EOF)



This looks like it may do this, but it looks like it's not in the
release
yet. I'll have to compile from scratch.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSU

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread David Lang
I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a chance to 
compile it myself and test it earlier)


I ran into two problems

first, %last:rest% does not match if there is nothing left on the line

i.e. a line that ends with an IP address will not match
rule=:%ip:ipv4%%last:rest%

secondly, liblognorm is selecting the rule that matches the least amount of the 
message.


so with these two rules

rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%

192.168.1.1/5 will get matched by the first rule, with '/5' in last, even though 
the second rule would match it. If I remove the first rule, the second rule does 
match and the parse succeeds.


David Lang

On Fri, 6 Feb 2015, David Lang wrote:

While I'm working to build packages of this to test with, what happens if you 
descend into a ruleset like the following


rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%/%port:number%%last:rest%

will it work to find the match that has the least left in last?

David Lang


On Fri, 6 Feb 2015, singh.janmejay wrote:


It's going to be in the coming release, just master build for now.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Feb 6, 2015 6:37 AM, "David Lang"  wrote:


On Wed, 4 Feb 2015, singh.janmejay wrote:

 On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:


 On Wed, 4 Feb 2015, singh.janmejay wrote:


 On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:





 Field type 'descent' does this, but not exactly in the same way.




does it? I understood it to just be calling another ruleset on the whole
line (doc problem again)



It allows field to identify how remaining-text should be returned, which
allows it to be parsed by remaining part of the rule which the field
belongs to.

Here is a test which uses something similar to what you are trying to do:
https://github.com/rsyslog/liblognorm/blob/master/tests/
field_tokenized_recursive.sh#L41

(check 41 to EOF)



This looks like it may do this, but it looks like it's not in the release
yet. I'll have to compile from scratch.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
LIKE THAT.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
LIKE THAT.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread singh.janmejay
It never goes back up because if any other rule was going to match the
current line, it would be a subtree of the current node (this is an
invariant).

It does try all sub-trees from any node before giving up. It first
tries all field-nodes, then appropriate literal-node.

In this case anything at the end will be matched by rest, the only
thing that rest will not match is string with 0 length, which the next
rule won't match anyway.

About 0-length suffix, I want to think a bit about how to support it
with descent. As of now it expects a remaining-text field.

Im unsure if this answers your question though.

On Thu, Mar 12, 2015 at 1:05 PM, David Lang  wrote:
> On Thu, 12 Mar 2015, singh.janmejay wrote:
>
>> On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
>>>
>>> On Thu, 12 Mar 2015, singh.janmejay wrote:
>>>
 Tried re-ordering it? Put the one with /port first?
>>>
>>>
>>>
>>> no, lognorm rules are not supposed to be order dependent, so I didn't try
>>> that (especially after finding things failing to parse with rsyslog that
>>> worked manually)
>>
>>
>> In case of input strings being matching-rule-wise disjoint, you are
>> right, order won't matter. But when they are not disjoint, order does
>> matter, because the first one to match the string wins.
>>
>> Consider this rulebase:
>> rule=:%ip:ipv4%%last:rest%
>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>>
>> If you write it the way I have above, you'll end up matching first
>> rule for input 10.20.30.40/5
>
>
> but when it can't find a match for / and has to undo the match and go back
> up the tree, why doesn't it try the next possible match? (repeating as
> needed until it has tried all possible branches of the tree)
>
> David Lang
>
>
>> But if you write it this way:
>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>> rule=:%ip:ipv4%%last:rest%
>>
>> You'll end up matching the first one.
>>
>> I know it appears order independent for your original rulebase, but
>> that is because fields are always tried first(in preference to
>> subtrees hanging off literals), and rest is a field, while '/' creates
>> a  litteral-subtree.
>>
>>>
 Yes, rest must get atleast one char to succeed. I'll create some new
 tests without rest-capture (and see what fails).
>>>
>>>
>>>
>>> Ok, this can be worked around (but it's a bit ugly), any reason why rest
>>> has
>>> to get at least one character?
>>
>>
>> Yep, its annoying, it happens only for last token.
>>
>> The reason is, parsed-fragment length >= input-string is used as a
>> termination condition for ln_normalize recursion (see ln_normalizeRec)
>> and the last token identified when recursion terminates is not the
>> terminal-node, so its not considered a complete match(one that goes
>> till leaf of ptree).
>>
>>>
>>> David Lang
>>>
>>>
 On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:
>
>
> I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a
> chance
> to
> compile it myself and test it earlier)
>
> I ran into two problems
>
> first, %last:rest% does not match if there is nothing left on the line
>
> i.e. a line that ends with an IP address will not match
> rule=:%ip:ipv4%%last:rest%
>
> secondly, liblognorm is selecting the rule that matches the least
> amount
> of
> the message.
>
> so with these two rules
>
> rule=:%ip:ipv4%%last:rest%
> rule=:%ip:ipv4%/%port:number%%last:rest%
>>
>>
>> I guess the hack I proposed above (using char-sep) can unblock you for
>> now, unless you hate its aesthetics too much :-).
>>
>
> 192.168.1.1/5 will get matched by the first rule, with '/5' in last,
> even
> though the second rule would match it. If I remove the first rule, the
> second rule does match and the parse succeeds.
>
> David Lang
>
>
> On Fri, 6 Feb 2015, David Lang wrote:
>
>> While I'm working to build packages of this to test with, what happens
>> if
>> you descend into a ruleset like the following
>>
>> rule=:%ip:ipv4%%last:rest%
>> rule=:%ip:ipv4%/%port:number%%last:rest%
>>
>> will it work to find the match that has the least left in last?
>>
>> David Lang
>>
>>
>> On Fri, 6 Feb 2015, singh.janmejay wrote:
>>
>>> It's going to be in the coming release, just master build for now.
>>>
>>> --
>>> Regards,
>>> Janmejay
>>>
>>> PS: Please blame the typos in this mail on my phone's uncivilized
>>> soft
>>> keyboard sporting it's not-so-smart-assist technology.
>>>
>>> On Feb 6, 2015 6:37 AM, "David Lang"  wrote:
>>>
 On Wed, 4 Feb 2015, singh.janmejay wrote:

  On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:
>
>
>
>
>  On Wed, 4 Feb 2015, singh.janmejay wrote:
>>
>>
>>
>>
>>  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:
>>
>>>

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Chris Schafer
David,
As far as docs go, when i went into documentation for liblognorm.com, i
found
http://www.liblognorm.com/files/manual/index.html

Which includes string-to. That said, I know it's there because I put the
function in, and if you have a suggestion as to better document the
functions, that could lead to a wider acceptance of libnorm.
t

On Thu, Mar 12, 2015 at 1:36 AM singh.janmejay 
wrote:

> It never goes back up because if any other rule was going to match the
> current line, it would be a subtree of the current node (this is an
> invariant).
>
> It does try all sub-trees from any node before giving up. It first
> tries all field-nodes, then appropriate literal-node.
>
> In this case anything at the end will be matched by rest, the only
> thing that rest will not match is string with 0 length, which the next
> rule won't match anyway.
>
> About 0-length suffix, I want to think a bit about how to support it
> with descent. As of now it expects a remaining-text field.
>
> Im unsure if this answers your question though.
>
> On Thu, Mar 12, 2015 at 1:05 PM, David Lang  wrote:
> > On Thu, 12 Mar 2015, singh.janmejay wrote:
> >
> >> On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
> >>>
> >>> On Thu, 12 Mar 2015, singh.janmejay wrote:
> >>>
>  Tried re-ordering it? Put the one with /port first?
> >>>
> >>>
> >>>
> >>> no, lognorm rules are not supposed to be order dependent, so I didn't
> try
> >>> that (especially after finding things failing to parse with rsyslog
> that
> >>> worked manually)
> >>
> >>
> >> In case of input strings being matching-rule-wise disjoint, you are
> >> right, order won't matter. But when they are not disjoint, order does
> >> matter, because the first one to match the string wins.
> >>
> >> Consider this rulebase:
> >> rule=:%ip:ipv4%%last:rest%
> >> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
> >>
> >> If you write it the way I have above, you'll end up matching first
> >> rule for input 10.20.30.40/5
> >
> >
> > but when it can't find a match for / and has to undo the match and go
> back
> > up the tree, why doesn't it try the next possible match? (repeating as
> > needed until it has tried all possible branches of the tree)
> >
> > David Lang
> >
> >
> >> But if you write it this way:
> >> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
> >> rule=:%ip:ipv4%%last:rest%
> >>
> >> You'll end up matching the first one.
> >>
> >> I know it appears order independent for your original rulebase, but
> >> that is because fields are always tried first(in preference to
> >> subtrees hanging off literals), and rest is a field, while '/' creates
> >> a  litteral-subtree.
> >>
> >>>
>  Yes, rest must get atleast one char to succeed. I'll create some new
>  tests without rest-capture (and see what fails).
> >>>
> >>>
> >>>
> >>> Ok, this can be worked around (but it's a bit ugly), any reason why
> rest
> >>> has
> >>> to get at least one character?
> >>
> >>
> >> Yep, its annoying, it happens only for last token.
> >>
> >> The reason is, parsed-fragment length >= input-string is used as a
> >> termination condition for ln_normalize recursion (see ln_normalizeRec)
> >> and the last token identified when recursion terminates is not the
> >> terminal-node, so its not considered a complete match(one that goes
> >> till leaf of ptree).
> >>
> >>>
> >>> David Lang
> >>>
> >>>
>  On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:
> >
> >
> > I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a
> > chance
> > to
> > compile it myself and test it earlier)
> >
> > I ran into two problems
> >
> > first, %last:rest% does not match if there is nothing left on the
> line
> >
> > i.e. a line that ends with an IP address will not match
> > rule=:%ip:ipv4%%last:rest%
> >
> > secondly, liblognorm is selecting the rule that matches the least
> > amount
> > of
> > the message.
> >
> > so with these two rules
> >
> > rule=:%ip:ipv4%%last:rest%
> > rule=:%ip:ipv4%/%port:number%%last:rest%
> >>
> >>
> >> I guess the hack I proposed above (using char-sep) can unblock you for
> >> now, unless you hate its aesthetics too much :-).
> >>
> >
> > 192.168.1.1/5 will get matched by the first rule, with '/5' in last,
> > even
> > though the second rule would match it. If I remove the first rule,
> the
> > second rule does match and the parse succeeds.
> >
> > David Lang
> >
> >
> > On Fri, 6 Feb 2015, David Lang wrote:
> >
> >> While I'm working to build packages of this to test with, what
> happens
> >> if
> >> you descend into a ruleset like the following
> >>
> >> rule=:%ip:ipv4%%last:rest%
> >> rule=:%ip:ipv4%/%port:number%%last:rest%
> >>
> >> will it work to find the match that has the least left in last?
> >>
> >> David Lang
> >>
> >>
> >> On Fri, 6 Feb 2015, singh.janmejay wrote:
> >>
> 

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Rainer Gerhards
2015-02-04 2:47 GMT+01:00 David Lang :

> as I'm spending a bunch of time making templates from cisco logs, a few
> thoughts on mmnormalize
>
> 1. It should probably set parsesuccess like mmjsonparse does
>
> 2. it would be useful to have something like char-to that accepted
> multiple characters as the termination pattern. thanks to the addition of
> toeknize I was able to work around this ('flags FIN ACK  on interface'
> where the number of flags listed is variable)
>
> 3. the number type should accept negative numbers, not just digits
>
>
> 4. it would be fantastic to be able to define custom types in the config
>
> example
>
> inside:1.2.3.4/56 is a pattern that happens a lot and I use
> %srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and
> %dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this
> pattern
>
>
Florian thankfully found some old PIX logs which I have been played with
the past days. I also came over this syntax. It possibly is something that
a special parser would make sense for. I am working on a log structure
analyser and this is one of the things it already finds rather quickly when
data is thrown at it. Unfortunately, I received 0 log contributions, so
it's very hard to find out what would be needed.


> , being able to define
>
> custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
>
> and then use "%src:info% to %dst:info% instead of that full pattern and
> have the resulting json be
> { src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...
>
>
> 5. Going back to the 'or' question. It would be even better to be able to
> define this custom type as a set of patterns.
>
> while inside:1.2.3.4/56 is a common endpoint definition there are also
> 1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4
>
> if you could define the custom type to be a list of patterns this would
> let you take advantage of the two-dimentional nature of JSON and simplify
> the ruleset considerably.
>

these things already show up greatly in the structure analyzer. The idea is
to evolve lognorm based on the findings of the structure analyzer.


Just FYI,
Rainer


>
> It would also give you a good way to handle the 'or' for Apache logs for
> example defining one of the options as a constant '-'
>
> defining an 'or' instead each pattern is a horrible mess to try and
> understand, but if it's done by implementing a new type, I don't have a
> problem with it.
>
> David Lang
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Rainer Gerhards
2015-02-04 13:52 GMT+01:00 David Lang :

> On Wed, 4 Feb 2015, singh.janmejay wrote:
>
>  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:
>>
>>  as I'm spending a bunch of time making templates from cisco logs, a few
>>> thoughts on mmnormalize
>>>
>>> 1. It should probably set parsesuccess like mmjsonparse does
>>>
>>>
>> This will be very useful.
>>
>>
>>
>>> 2. it would be useful to have something like char-to that accepted
>>> multiple characters as the termination pattern. thanks to the addition of
>>> toeknize I was able to work around this ('flags FIN ACK  on interface'
>>> where the number of flags listed is variable)
>>>
>>>
>> I felt the need for this too. I believe the recent string-to thing does
>> this?
>>
>
> I missed that. One thing that is wrong with liblognorm and mmnormalize is
> that the docs that are pointed to are horribly out of date and don't
> mention a lot of these capabilities. I cloned the source from github and
> was looking through it to find things, but apparently missed this one.
>
>
Mhh... I updated the web site to autoupdate from the repo doc. I just
checked and it looks fine. Do you really get the old doc? (the new one says
1.1.1 for example).

Rainer

>
>>
>>> 3. the number type should accept negative numbers, not just digits
>>>
>>>
>>> 4. it would be fantastic to be able to define custom types in the config
>>>
>>> example
>>>
>>> inside:1.2.3.4/56 is a pattern that happens a lot and I use
>>> %srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and
>>> %dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this
>>> pattern
>>>
>>> , being able to define
>>>
>>> custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
>>>
>>> and then use "%src:info% to %dst:info% instead of that full pattern and
>>> have the resulting json be
>>> { src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...
>>>
>>>
>>>
>> Field type 'descent' does this, but not exactly in the same way.
>>
>
> does it? I understood it to just be calling another ruleset on the whole
> line (doc problem again)
>
> David Lang
>
>
>
>>
>>> 5. Going back to the 'or' question. It would be even better to be able to
>>> define this custom type as a set of patterns.
>>>
>>> while inside:1.2.3.4/56 is a common endpoint definition there are also
>>> 1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4
>>>
>>> if you could define the custom type to be a list of patterns this would
>>> let you take advantage of the two-dimentional nature of JSON and simplify
>>> the ruleset considerably.
>>>
>>> It would also give you a good way to handle the 'or' for Apache logs for
>>> example defining one of the options as a constant '-'
>>>
>>> defining an 'or' instead each pattern is a horrible mess to try and
>>> understand, but if it's done by implementing a new type, I don't have a
>>> problem with it.
>>>
>>> David Lang
>>> ___
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>>
>>
>>
>>
>>  ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Rainer Gerhards
2015-03-12 12:50 GMT+01:00 Rainer Gerhards :

> 2015-02-04 13:52 GMT+01:00 David Lang :
>
>> On Wed, 4 Feb 2015, singh.janmejay wrote:
>>
>>  On Wed, Feb 4, 2015 at 7:17 AM, David Lang  wrote:
>>>
>>>  as I'm spending a bunch of time making templates from cisco logs, a few
 thoughts on mmnormalize

 1. It should probably set parsesuccess like mmjsonparse does


>>> This will be very useful.
>>>
>>>
>>>
 2. it would be useful to have something like char-to that accepted
 multiple characters as the termination pattern. thanks to the addition
 of
 toeknize I was able to work around this ('flags FIN ACK  on interface'
 where the number of flags listed is variable)


>>> I felt the need for this too. I believe the recent string-to thing does
>>> this?
>>>
>>
>> I missed that. One thing that is wrong with liblognorm and mmnormalize is
>> that the docs that are pointed to are horribly out of date and don't
>> mention a lot of these capabilities. I cloned the source from github and
>> was looking through it to find things, but apparently missed this one.
>>
>>
> Mhh... I updated the web site to autoupdate from the repo doc. I just
> checked and it looks fine. Do you really get the old doc? (the new one says
> 1.1.1 for example).
>
>
sorry -- I didn't realize the early mails were from Feb... Just discard my
message ;)

Rainer

> Rainer
>
>>
>>>
 3. the number type should accept negative numbers, not just digits


 4. it would be fantastic to be able to define custom types in the config

 example

 inside:1.2.3.4/56 is a pattern that happens a lot and I use
 %srciface:char-to:\x3a%\x3a%srcip:ipv4%/%srcport:number% and
 %dstiface:char-to:\x3a%\x3a%dstip:ipv4%/%dstport:number% to match this
 pattern

 , being able to define

 custom=info:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%

 and then use "%src:info% to %dst:info% instead of that full pattern and
 have the resulting json be
 { src : { iface : inside, ip : 1.2.3.4, port : 56 }, { dst...



>>> Field type 'descent' does this, but not exactly in the same way.
>>>
>>
>> does it? I understood it to just be calling another ruleset on the whole
>> line (doc problem again)
>>
>> David Lang
>>
>>
>>
>>>
 5. Going back to the 'or' question. It would be even better to be able
 to
 define this custom type as a set of patterns.

 while inside:1.2.3.4/56 is a common endpoint definition there are also
 1.2.3.4/56 inside:1.2.3.4/56(string) inside/1.2.3.4 and 1.2.3.4

 if you could define the custom type to be a list of patterns this would
 let you take advantage of the two-dimentional nature of JSON and
 simplify
 the ruleset considerably.

 It would also give you a good way to handle the 'or' for Apache logs for
 example defining one of the options as a constant '-'

 defining an 'or' instead each pattern is a horrible mess to try and
 understand, but if it's done by implementing a new type, I don't have a
 problem with it.

 David Lang
 ___
 rsyslog mailing list
 http://lists.adiscon.net/mailman/listinfo/rsyslog
 http://www.rsyslog.com/professional-services/
 What's up with rsyslog? Follow https://twitter.com/rgerhards
 NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
 of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
 DON'T LIKE THAT.


>>>
>>>
>>>
>>>  ___
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Rainer Gerhards
2015-03-12 5:55 GMT+01:00 singh.janmejay :

> On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
> > On Thu, 12 Mar 2015, singh.janmejay wrote:
> >
> >> Tried re-ordering it? Put the one with /port first?
> >
> >
> > no, lognorm rules are not supposed to be order dependent, so I didn't try
> > that (especially after finding things failing to parse with rsyslog that
> > worked manually)
>
> In case of input strings being matching-rule-wise disjoint, you are
> right, order won't matter. But when they are not disjoint, order does
> matter, because the first one to match the string wins.
>
> Consider this rulebase:
> rule=:%ip:ipv4%%last:rest%
> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>
> If you write it the way I have above, you'll end up matching first
> rule for input 10.20.30.40/5
>
> But if you write it this way:
> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
> rule=:%ip:ipv4%%last:rest%
>
> You'll end up matching the first one.
>

This shouldn't happen. The theory is:

Let i be the current index to be looked at at the line. If for i a parser
is selected, parsers shall be tried first (in theory, according to parser
ordering, but I think this is not yet fully implemented). If a parser fits,
processing is advanced to next tree node.

If the node at i does not have a parser (or all parsers failed, I think
[but not sure]), advance to next node basded on character match.

The order of apperance of rules inside the rulebase should not affect this.
If it does, it's either not yet implemented or a bug. this is also why I
don't like the "rest" syntax -it always matches and thus terminates
interpretation.


> I know it appears order independent for your original rulebase, but
> that is because fields are always tried first(in preference to
> subtrees hanging off literals), and rest is a field, while '/' creates
> a  litteral-subtree.
>
> >
> >> Yes, rest must get atleast one char to succeed. I'll create some new
> >> tests without rest-capture (and see what fails).
> >
> >
> > Ok, this can be worked around (but it's a bit ugly), any reason why rest
> has
> > to get at least one character?
>
> Yep, its annoying, it happens only for last token.
>
> The reason is, parsed-fragment length >= input-string is used as a
> termination condition for ln_normalize recursion (see ln_normalizeRec)
> and the last token identified when recursion terminates is not the
> terminal-node, so its not considered a complete match(one that goes
> till leaf of ptree).
>
> >
> > David Lang
> >
> >
> >> On Thu, Mar 12, 2015 at 1:09 AM, David Lang  wrote:
> >>>
> >>> I just upgraded to liblognorm 1.1.1 (unfortunantly I didn't get a
> chance
> >>> to
> >>> compile it myself and test it earlier)
> >>>
> >>> I ran into two problems
> >>>
> >>> first, %last:rest% does not match if there is nothing left on the line
> >>>
> >>> i.e. a line that ends with an IP address will not match
> >>> rule=:%ip:ipv4%%last:rest%
> >>>
> >>> secondly, liblognorm is selecting the rule that matches the least
> amount
> >>> of
> >>> the message.
> >>>
> >>> so with these two rules
> >>>
> >>> rule=:%ip:ipv4%%last:rest%
> >>> rule=:%ip:ipv4%/%port:number%%last:rest%
>
> I guess the hack I proposed above (using char-sep) can unblock you for
> now, unless you hate its aesthetics too much :-).
>
> >>>
> >>> 192.168.1.1/5 will get matched by the first rule, with '/5' in last,
> even
> >>> though the second rule would match it. If I remove the first rule, the
> >>> second rule does match and the parse succeeds.
> >>>
> >>> David Lang
> >>>
> >>>
> >>> On Fri, 6 Feb 2015, David Lang wrote:
> >>>
>  While I'm working to build packages of this to test with, what happens
>  if
>  you descend into a ruleset like the following
> 
>  rule=:%ip:ipv4%%last:rest%
>  rule=:%ip:ipv4%/%port:number%%last:rest%
> 
>  will it work to find the match that has the least left in last?
> 
>  David Lang
> 
> 
>  On Fri, 6 Feb 2015, singh.janmejay wrote:
> 
> > It's going to be in the coming release, just master build for now.
> >
> > --
> > Regards,
> > Janmejay
> >
> > PS: Please blame the typos in this mail on my phone's uncivilized
> soft
> > keyboard sporting it's not-so-smart-assist technology.
> >
> > On Feb 6, 2015 6:37 AM, "David Lang"  wrote:
> >
> >> On Wed, 4 Feb 2015, singh.janmejay wrote:
> >>
> >>  On Wed, Feb 4, 2015 at 6:22 PM, David Lang  wrote:
> >>>
> >>>
> >>>
> >>>  On Wed, 4 Feb 2015, singh.janmejay wrote:
> 
> 
> 
>   On Wed, Feb 4, 2015 at 7:17 AM, David Lang 
> wrote:
> 
> >
> >>
> >>  Field type 'descent' does this, but not exactly in the same
> way.
> >
> >
> >
> >
>  does it? I understood it to just be calling another ruleset on the
>  whole
>  line (doc problem again)
> 
> 
> >>> It allows fie

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread David Lang

On Thu, 12 Mar 2015, Rainer Gerhards wrote:


2015-03-12 5:55 GMT+01:00 singh.janmejay :


On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:

On Thu, 12 Mar 2015, singh.janmejay wrote:


Tried re-ordering it? Put the one with /port first?



no, lognorm rules are not supposed to be order dependent, so I didn't try
that (especially after finding things failing to parse with rsyslog that
worked manually)


In case of input strings being matching-rule-wise disjoint, you are
right, order won't matter. But when they are not disjoint, order does
matter, because the first one to match the string wins.

Consider this rulebase:
rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

If you write it the way I have above, you'll end up matching first
rule for input 10.20.30.40/5

But if you write it this way:
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
rule=:%ip:ipv4%%last:rest%

You'll end up matching the first one.



This shouldn't happen. The theory is:

Let i be the current index to be looked at at the line. If for i a parser
is selected, parsers shall be tried first (in theory, according to parser
ordering, but I think this is not yet fully implemented). If a parser fits,
processing is advanced to next tree node.

If the node at i does not have a parser (or all parsers failed, I think
[but not sure]), advance to next node basded on character match.

The order of apperance of rules inside the rulebase should not affect this.
If it does, it's either not yet implemented or a bug. this is also why I
don't like the "rest" syntax -it always matches and thus terminates
interpretation.


I'll post a simple test case when I get into the office in a bit.

In this particular case, it's failing to check other parsers when it hits a 
failure and backs up.


But there are other cases where multiple rules may match. stringto, rest, 
iptables are all things that can easily match a lot of data where other rules 
may also match by having more specific listings. In such cases it should still 
be deterministing which rule 'wins'. I can think of a few ways to define this.


1. fewest parsers needed wins

2. most parsers needed wins

3. ordering of parsers, where the 'greedier' ones are put last so they only come 
into play if the more specific ones don't match.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Rainer Gerhards
2015-03-12 16:41 GMT+01:00 David Lang :

> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
>
>  2015-03-12 5:55 GMT+01:00 singh.janmejay :
>>
>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
>>>
 On Thu, 12 Mar 2015, singh.janmejay wrote:

  Tried re-ordering it? Put the one with /port first?
>


 no, lognorm rules are not supposed to be order dependent, so I didn't
 try
 that (especially after finding things failing to parse with rsyslog that
 worked manually)

>>>
>>> In case of input strings being matching-rule-wise disjoint, you are
>>> right, order won't matter. But when they are not disjoint, order does
>>> matter, because the first one to match the string wins.
>>>
>>> Consider this rulebase:
>>> rule=:%ip:ipv4%%last:rest%
>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>>>
>>> If you write it the way I have above, you'll end up matching first
>>> rule for input 10.20.30.40/5
>>>
>>> But if you write it this way:
>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>>> rule=:%ip:ipv4%%last:rest%
>>>
>>> You'll end up matching the first one.
>>>
>>>
>> This shouldn't happen. The theory is:
>>
>> Let i be the current index to be looked at at the line. If for i a parser
>> is selected, parsers shall be tried first (in theory, according to parser
>> ordering, but I think this is not yet fully implemented). If a parser
>> fits,
>> processing is advanced to next tree node.
>>
>> If the node at i does not have a parser (or all parsers failed, I think
>> [but not sure]), advance to next node basded on character match.
>>
>> The order of apperance of rules inside the rulebase should not affect
>> this.
>> If it does, it's either not yet implemented or a bug. this is also why I
>> don't like the "rest" syntax -it always matches and thus terminates
>> interpretation.
>>
>
> I'll post a simple test case when I get into the office in a bit.
>
> In this particular case, it's failing to check other parsers when it hits
> a failure and backs up.
>
> But there are other cases where multiple rules may match. stringto, rest,


word, stringto are "last resort parsers", to be used only if anything else
fails.
rest IMHO should never be used, but I think I can propose something in the
future that solves the need that comes with it (if there still is a need at
that point).


> iptables


iptables is a different story, it's actually for a different type of logs -
at least I think so now. I am unfortunately not prepared to discuss this
right now, as I want to keep concentrated on the log structure analyzer. It
doesn't help if I do a bit of everything without anything ever nearing
completion ;)


> are all things that can easily match a lot of data where other rules may
> also match by having more specific listings. In such cases it should still
> be deterministing which rule 'wins'. I can think of a few ways to define
> this.
>
> 1. fewest parsers needed wins
>
> 2. most parsers needed wins
>
> 3. ordering of parsers, where the 'greedier' ones are put last so they
> only come into play if the more specific ones don't match.
>
>
That's the designed approach, and I am very sure it's the right one. As I
said, it's at least not fully implemented.

This also means we need many more specific parsers. I never get there,
because of a) time shortage and b) lack of sufficient log samples. Where
log samples is not a single line or two, but at least several thousands, so
that I can evaluate false positives. While b) is still a very big problem
to me, a) has been much relaxed thanks to the thesis work. Also, work on
the semi-automatic rule creator looks promising. As it is a heuristic, the
lack of log samples unfortunately is a very large hindering block.

Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread singh.janmejay
On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
 wrote:
> 2015-03-12 16:41 GMT+01:00 David Lang :
>
>> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
>>
>>  2015-03-12 5:55 GMT+01:00 singh.janmejay :
>>>
>>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:

> On Thu, 12 Mar 2015, singh.janmejay wrote:
>
>  Tried re-ordering it? Put the one with /port first?
>>
>
>
> no, lognorm rules are not supposed to be order dependent, so I didn't
> try
> that (especially after finding things failing to parse with rsyslog that
> worked manually)
>

 In case of input strings being matching-rule-wise disjoint, you are
 right, order won't matter. But when they are not disjoint, order does
 matter, because the first one to match the string wins.

 Consider this rulebase:
 rule=:%ip:ipv4%%last:rest%
 rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

 If you write it the way I have above, you'll end up matching first
 rule for input 10.20.30.40/5

 But if you write it this way:
 rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
 rule=:%ip:ipv4%%last:rest%

 You'll end up matching the first one.


>>> This shouldn't happen. The theory is:
>>>
>>> Let i be the current index to be looked at at the line. If for i a parser
>>> is selected, parsers shall be tried first (in theory, according to parser
>>> ordering, but I think this is not yet fully implemented). If a parser
>>> fits,
>>> processing is advanced to next tree node.
>>>
>>> If the node at i does not have a parser (or all parsers failed, I think
>>> [but not sure]), advance to next node basded on character match.

This is precisely what it does.

>>>
>>> The order of apperance of rules inside the rulebase should not affect
>>> this.

It doesn't for litteral-subtree, but it does for field-subtree,
because they are inserted at the tail of the linked-list.

This code (https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
adds new subtrees at the end of linked-list, which is what causes the
ordering-sensitive behaviour.

>>> If it does, it's either not yet implemented or a bug. this is also why I
>>> don't like the "rest" syntax -it always matches and thus terminates
>>> interpretation.
>>>
>>
>> I'll post a simple test case when I get into the office in a bit.
>>
>> In this particular case, it's failing to check other parsers when it hits
>> a failure and backs up.
>>
>> But there are other cases where multiple rules may match. stringto, rest,
>
>
> word, stringto are "last resort parsers", to be used only if anything else
> fails.
> rest IMHO should never be used, but I think I can propose something in the
> future that solves the need that comes with it (if there still is a need at
> that point).
>
>
>> iptables
>
>
> iptables is a different story, it's actually for a different type of logs -
> at least I think so now. I am unfortunately not prepared to discuss this
> right now, as I want to keep concentrated on the log structure analyzer. It
> doesn't help if I do a bit of everything without anything ever nearing
> completion ;)
>
>
>> are all things that can easily match a lot of data where other rules may
>> also match by having more specific listings. In such cases it should still
>> be deterministing which rule 'wins'. I can think of a few ways to define
>> this.
>>
>> 1. fewest parsers needed wins
>>
>> 2. most parsers needed wins

This is probably the closest simple approximation to best match.

I was thinking about this too.

>>
>> 3. ordering of parsers, where the 'greedier' ones are put last so they
>> only come into play if the more specific ones don't match.

We could assist it by setting relative weights etc. Eg. ipv4 gets
weight 10, but rest gets only 1 etc.

Once we get the coefficients right, this can probably be achieved(its
like a costing-based picker, run once ptree has been loaded to sort
all subtree lists by cost in one shot).

>>
>>
> That's the designed approach, and I am very sure it's the right one. As I
> said, it's at least not fully implemented.
>
> This also means we need many more specific parsers. I never get there,
> because of a) time shortage and b) lack of sufficient log samples. Where
> log samples is not a single line or two, but at least several thousands, so
> that I can evaluate false positives. While b) is still a very big problem
> to me, a) has been much relaxed thanks to the thesis work. Also, work on
> the semi-automatic rule creator looks promising. As it is a heuristic, the
> lack of log samples unfortunately is a very large hindering block.
>
> Rainer
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLE

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread Rainer Gerhards
2015-03-12 18:16 GMT+01:00 singh.janmejay :

> On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
>  wrote:
> > 2015-03-12 16:41 GMT+01:00 David Lang :
> >
> >> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
> >>
> >>  2015-03-12 5:55 GMT+01:00 singh.janmejay :
> >>>
> >>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
> 
> > On Thu, 12 Mar 2015, singh.janmejay wrote:
> >
> >  Tried re-ordering it? Put the one with /port first?
> >>
> >
> >
> > no, lognorm rules are not supposed to be order dependent, so I didn't
> > try
> > that (especially after finding things failing to parse with rsyslog
> that
> > worked manually)
> >
> 
>  In case of input strings being matching-rule-wise disjoint, you are
>  right, order won't matter. But when they are not disjoint, order does
>  matter, because the first one to match the string wins.
> 
>  Consider this rulebase:
>  rule=:%ip:ipv4%%last:rest%
>  rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
> 
>  If you write it the way I have above, you'll end up matching first
>  rule for input 10.20.30.40/5
> 
>  But if you write it this way:
>  rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>  rule=:%ip:ipv4%%last:rest%
> 
>  You'll end up matching the first one.
> 
> 
> >>> This shouldn't happen. The theory is:
> >>>
> >>> Let i be the current index to be looked at at the line. If for i a
> parser
> >>> is selected, parsers shall be tried first (in theory, according to
> parser
> >>> ordering, but I think this is not yet fully implemented). If a parser
> >>> fits,
> >>> processing is advanced to next tree node.
> >>>
> >>> If the node at i does not have a parser (or all parsers failed, I think
> >>> [but not sure]), advance to next node basded on character match.
>
> This is precisely what it does.
>
> >>>
> >>> The order of apperance of rules inside the rulebase should not affect
> >>> this.
>
> It doesn't for litteral-subtree, but it does for field-subtree,
> because they are inserted at the tail of the linked-list.
>
> This code (
> https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
> adds new subtrees at the end of linked-list, which is what causes the
> ordering-sensitive behaviour.
>
>
OK, it seems like I overlooked this effect. I don't think it is good to
have any order dependence. Anyways, the work I am carrying out will most
probably lead to algorithmic changes and I'll re-evaluate that when I reach
that point (not soon). Of course, I won't break anything that exists. If
things diverge too much, I'll add an alternate library,. But again, this
needs to be seen and it is too early to think about this,

On the ordering issue: are you sure that the order is always properly
preserved? I never put any effort into it (as order was designed
irrelevant) and some reodering (IIRC) happens intentionally (parser
priorities).

Rainer


> >>> If it does, it's either not yet implemented or a bug. this is also why
> I
> >>> don't like the "rest" syntax -it always matches and thus terminates
> >>> interpretation.
> >>>
> >>
> >> I'll post a simple test case when I get into the office in a bit.
> >>
> >> In this particular case, it's failing to check other parsers when it
> hits
> >> a failure and backs up.
> >>
> >> But there are other cases where multiple rules may match. stringto,
> rest,
> >
> >
> > word, stringto are "last resort parsers", to be used only if anything
> else
> > fails.
> > rest IMHO should never be used, but I think I can propose something in
> the
> > future that solves the need that comes with it (if there still is a need
> at
> > that point).
> >
> >
> >> iptables
> >
> >
> > iptables is a different story, it's actually for a different type of
> logs -
> > at least I think so now. I am unfortunately not prepared to discuss this
> > right now, as I want to keep concentrated on the log structure analyzer.
> It
> > doesn't help if I do a bit of everything without anything ever nearing
> > completion ;)
> >
> >
> >> are all things that can easily match a lot of data where other rules may
> >> also match by having more specific listings. In such cases it should
> still
> >> be deterministing which rule 'wins'. I can think of a few ways to define
> >> this.
> >>
> >> 1. fewest parsers needed wins
> >>
> >> 2. most parsers needed wins
>
> This is probably the closest simple approximation to best match.
>
> I was thinking about this too.
>
> >>
> >> 3. ordering of parsers, where the 'greedier' ones are put last so they
> >> only come into play if the more specific ones don't match.
>
> We could assist it by setting relative weights etc. Eg. ipv4 gets
> weight 10, but rest gets only 1 etc.
>
> Once we get the coefficients right, this can probably be achieved(its
> like a costing-based picker, run once ptree has been loaded to sort
> all subtree lists by cost in one shot).
>
> >>
> >>
> > That's the designed approach, and I am very 

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread singh.janmejay
I haven't seen the reordering code yet, but the loading does preserve order.

It still is deterministic, just that the criteria is rule-order (and
it being applicable only for field-subtrees makes it slightly odd).

On Thu, Mar 12, 2015 at 10:55 PM, Rainer Gerhards
 wrote:
> 2015-03-12 18:16 GMT+01:00 singh.janmejay :
>
>> On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
>>  wrote:
>> > 2015-03-12 16:41 GMT+01:00 David Lang :
>> >
>> >> On Thu, 12 Mar 2015, Rainer Gerhards wrote:
>> >>
>> >>  2015-03-12 5:55 GMT+01:00 singh.janmejay :
>> >>>
>> >>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
>> 
>> > On Thu, 12 Mar 2015, singh.janmejay wrote:
>> >
>> >  Tried re-ordering it? Put the one with /port first?
>> >>
>> >
>> >
>> > no, lognorm rules are not supposed to be order dependent, so I didn't
>> > try
>> > that (especially after finding things failing to parse with rsyslog
>> that
>> > worked manually)
>> >
>> 
>>  In case of input strings being matching-rule-wise disjoint, you are
>>  right, order won't matter. But when they are not disjoint, order does
>>  matter, because the first one to match the string wins.
>> 
>>  Consider this rulebase:
>>  rule=:%ip:ipv4%%last:rest%
>>  rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>> 
>>  If you write it the way I have above, you'll end up matching first
>>  rule for input 10.20.30.40/5
>> 
>>  But if you write it this way:
>>  rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
>>  rule=:%ip:ipv4%%last:rest%
>> 
>>  You'll end up matching the first one.
>> 
>> 
>> >>> This shouldn't happen. The theory is:
>> >>>
>> >>> Let i be the current index to be looked at at the line. If for i a
>> parser
>> >>> is selected, parsers shall be tried first (in theory, according to
>> parser
>> >>> ordering, but I think this is not yet fully implemented). If a parser
>> >>> fits,
>> >>> processing is advanced to next tree node.
>> >>>
>> >>> If the node at i does not have a parser (or all parsers failed, I think
>> >>> [but not sure]), advance to next node basded on character match.
>>
>> This is precisely what it does.
>>
>> >>>
>> >>> The order of apperance of rules inside the rulebase should not affect
>> >>> this.
>>
>> It doesn't for litteral-subtree, but it does for field-subtree,
>> because they are inserted at the tail of the linked-list.
>>
>> This code (
>> https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
>> adds new subtrees at the end of linked-list, which is what causes the
>> ordering-sensitive behaviour.
>>
>>
> OK, it seems like I overlooked this effect. I don't think it is good to
> have any order dependence. Anyways, the work I am carrying out will most
> probably lead to algorithmic changes and I'll re-evaluate that when I reach
> that point (not soon). Of course, I won't break anything that exists. If
> things diverge too much, I'll add an alternate library,. But again, this
> needs to be seen and it is too early to think about this,
>
> On the ordering issue: are you sure that the order is always properly
> preserved? I never put any effort into it (as order was designed
> irrelevant) and some reodering (IIRC) happens intentionally (parser
> priorities).
>
> Rainer
>
>
>> >>> If it does, it's either not yet implemented or a bug. this is also why
>> I
>> >>> don't like the "rest" syntax -it always matches and thus terminates
>> >>> interpretation.
>> >>>
>> >>
>> >> I'll post a simple test case when I get into the office in a bit.
>> >>
>> >> In this particular case, it's failing to check other parsers when it
>> hits
>> >> a failure and backs up.
>> >>
>> >> But there are other cases where multiple rules may match. stringto,
>> rest,
>> >
>> >
>> > word, stringto are "last resort parsers", to be used only if anything
>> else
>> > fails.
>> > rest IMHO should never be used, but I think I can propose something in
>> the
>> > future that solves the need that comes with it (if there still is a need
>> at
>> > that point).
>> >
>> >
>> >> iptables
>> >
>> >
>> > iptables is a different story, it's actually for a different type of
>> logs -
>> > at least I think so now. I am unfortunately not prepared to discuss this
>> > right now, as I want to keep concentrated on the log structure analyzer.
>> It
>> > doesn't help if I do a bit of everything without anything ever nearing
>> > completion ;)
>> >
>> >
>> >> are all things that can easily match a lot of data where other rules may
>> >> also match by having more specific listings. In such cases it should
>> still
>> >> be deterministing which rule 'wins'. I can think of a few ways to define
>> >> this.
>> >>
>> >> 1. fewest parsers needed wins
>> >>
>> >> 2. most parsers needed wins
>>
>> This is probably the closest simple approximation to best match.
>>
>> I was thinking about this too.
>>
>> >>
>> >> 3. ordering of parsers, where the 'greedier' ones are put last s

Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread David Lang

On Thu, 12 Mar 2015, David Lang wrote:


On Thu, 12 Mar 2015, Rainer Gerhards wrote:


2015-03-12 5:55 GMT+01:00 singh.janmejay :


On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:

On Thu, 12 Mar 2015, singh.janmejay wrote:


Tried re-ordering it? Put the one with /port first?



no, lognorm rules are not supposed to be order dependent, so I didn't try
that (especially after finding things failing to parse with rsyslog that
worked manually)


In case of input strings being matching-rule-wise disjoint, you are
right, order won't matter. But when they are not disjoint, order does
matter, because the first one to match the string wins.

Consider this rulebase:
rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

If you write it the way I have above, you'll end up matching first
rule for input 10.20.30.40/5

But if you write it this way:
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
rule=:%ip:ipv4%%last:rest%

You'll end up matching the first one.



This shouldn't happen. The theory is:

Let i be the current index to be looked at at the line. If for i a parser
is selected, parsers shall be tried first (in theory, according to parser
ordering, but I think this is not yet fully implemented). If a parser fits,
processing is advanced to next tree node.

If the node at i does not have a parser (or all parsers failed, I think
[but not sure]), advance to next node basded on character match.

The order of apperance of rules inside the rulebase should not affect this.
If it does, it's either not yet implemented or a bug. this is also why I
don't like the "rest" syntax -it always matches and thus terminates
interpretation.


I'll post a simple test case when I get into the office in a bit.


# %ASA-6-302013: Built outbound TCP connection 190101710 for 
Outside:10.1.50.85/514 (10.1.50.85/514) to inside:10.51.50.88/34423 
(10.51.50.88/34423)

# %ASA-6-302013: Built inbound TCP connection 46818840 for 
outside:192.168.200.117/53137 
(192.168.200.117/53137)(LOCALCP-7945G-SEP00235E17E438) to 
outside:192.168.200.1/2000 (192.168.200.1/2000) (CP-7945G-SEP00235E17E438)

# %ASA-6-302013: Built inbound TCP connection 51708529 for 
outside:10.1.50.50/55474 (10.1.50.50/55474) to backup:192.168.200.130/1753 
(192.168.200.130/1753)(LOCALCP-7945G-SEPC40ACB4CBDF7)

# %ASA-6-302013: Built inbound TCP connection 53349356 for 
outside:192.168.200.150/59220 (192.168.200.150/59220)(LOCAL\\David.Adler) to 
outside:192.168.200.36/3283 
(192.168.200.36/3283)(LOCAL\\CP-7945G-SEP189C5D21800C) (David.Adler)

rule=cisco,ASA-6-302013: \x25ASA-6-302013\x3a Built %direction:word% 
%proto:word% connection %connection-id:number% for 
%source:descent:/root/cisco.endpoint% 
(%sourcenat:descent:/root/cisco.endpoint%) to 
%dest:descent:/root/cisco.endpoint% (%destnat:descent:/root/cisco.endpoint%)

rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) (%label2:char-to:)%)
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%
rule=:%ip:ipv4%/%port:number%%tail:rest%
rule=:%ip:ipv4%
rule=:%ip:ipv4% %tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% (%label2:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%

David Lang


In this particular case, it's failing to check other parsers when it hits a 
failure and backs up.


But there are other cases where multiple rules may match. stringto, rest, 
iptables are all things that can easily match a lot of data where other rules 
may also match by having more specific listings. In such cases it should 
still be deterministing which rule 'wins'. I can think of a few ways to 
define this.


1. fewest parsers needed wins

2. most parsers needed wins

3. ordering of parsers, where the 'greedier' ones are put last so they only 
come into play if the more specific ones don't match.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
LIKE THAT.




Re: [rsyslog] mmnormalize thoughts

2015-03-12 Thread David Lang

On Thu, 12 Mar 2015, singh.janmejay wrote:


I haven't seen the reordering code yet, but the loading does preserve order.

It still is deterministic, just that the criteria is rule-order (and
it being applicable only for field-subtrees makes it slightly odd).


this is definantly an issue

looking at my cisco.endpoint ruleset

origionally I had:

rule=:%ip:ipv4%%tail:rest%
rule=:%ip:ipv4%/%port:number%%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)

After learning about the rest issue I duplicated each line without the 
%tail:rest% at the end


still not working without disabling the items with rest in them

so after the discussion on ordering, I tried reversing all the rules, it still 
didn't work because the char-to matches better than the ipv4.


so for the moment I have the rules as:

rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) (%label2:char-to:)%)
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%
rule=:%ip:ipv4%/%port:number%%tail:rest%
rule=:%ip:ipv4%
rule=:%ip:ipv4%%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% (%label2:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%

but I'm not sure if this really will work or not without testing every specific 
case because I don't know where the order is going to matter, and the char-to 
may match cases where It isn't going to match the rest of the rule and it won't 
fall through to the shorter match.


order dependency is not the right answer.

Why does this need to be added to the end of the tree rather than being 
positioned like any other rule components?


David Lang




On Thu, Mar 12, 2015 at 10:55 PM, Rainer Gerhards
 wrote:

2015-03-12 18:16 GMT+01:00 singh.janmejay :


On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
 wrote:

2015-03-12 16:41 GMT+01:00 David Lang :


On Thu, 12 Mar 2015, Rainer Gerhards wrote:

 2015-03-12 5:55 GMT+01:00 singh.janmejay :


 On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:



On Thu, 12 Mar 2015, singh.janmejay wrote:

 Tried re-ordering it? Put the one with /port first?





no, lognorm rules are not supposed to be order dependent, so I didn't
try
that (especially after finding things failing to parse with rsyslog

that

worked manually)



In case of input strings being matching-rule-wise disjoint, you are
right, order won't matter. But when they are not disjoint, order does
matter, because the first one to match the string wins.

Consider this rulebase:
rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

If you write it the way I have above, you'll end up matching first
rule for input 10.20.30.40/5

But if you write it this way:
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
rule=:%ip:ipv4%%last:rest%

You'll end up matching the first one.



This shouldn't happen. The theory is:

Let i be the current index to be looked at at the line. If for i a

parser

is selected, parsers shall be tried first (in theory, according to

parser

ordering, but I think this is not yet fully implemented). If a parser
fits,
processing is advanced to next tree node.

If the node at i does not have a parser (or all parsers failed, I think
[but not sure]), advance to next node basded on character match.


This is precisely what it does.



The order of apperance of rules inside the rulebase should not affect
this.


It doesn't for litteral-subtree, but it does for field-subtree,
because they are inserted at the tail of the linked-list.

This code (
https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
adds new subtrees at the end of linked-list, which is what causes the
ord

Re: [rsyslog] mmnormalize thoughts

2015-03-13 Thread Rainer Gerhards
2015-03-13 1:26 GMT+01:00 David Lang :

> On Thu, 12 Mar 2015, singh.janmejay wrote:
>
>  I haven't seen the reordering code yet, but the loading does preserve
>> order.
>>
>> It still is deterministic, just that the criteria is rule-order (and
>> it being applicable only for field-subtrees makes it slightly odd).
>>
>
> this is definantly an issue
>
> looking at my cisco.endpoint ruleset
>
> origionally I had:
>
> rule=:%ip:ipv4%%tail:rest%
> rule=:%ip:ipv4%/%port:number%%tail:rest%
> rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
> rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
> rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)
> (%label2:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%
> label1:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
> (%label2:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
> (%label2:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
> (%label2:char-to:)%)
>
> After learning about the rest issue I duplicated each line without the
> %tail:rest% at the end
>
> still not working without disabling the items with rest in them
>
> so after the discussion on ordering, I tried reversing all the rules, it
> still didn't work because the char-to matches better than the ipv4.
>
> so for the moment I have the rules as:
>
> rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) (%label2:char-to:)%)
> rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)
> (%label2:char-to:)%)%tail:rest%
> rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)
> rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
> rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)
> rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
> rule=:%ip:ipv4%/%port:number%
> rule=:%ip:ipv4%/%port:number%%tail:rest%
> rule=:%ip:ipv4%
> rule=:%ip:ipv4%%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
> (%label2:char-to:)%)
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
> (%label2:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% (%label2:char-to:)%)
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
> (%label2:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%
> label1:char-to:)%)%tail:rest%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
> rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%
>
> but I'm not sure if this really will work or not without testing every
> specific case because I don't know where the order is going to matter, and
> the char-to may match cases where It isn't going to match the rest of the
> rule and it won't fall through to the shorter match.
>
> order dependency is not the right answer.
>
>
I currently consider this a bug that needs to be fixed at some time. But
again, I don't think *now* is the right time to do so (at least not for me).

Rainer

> Why does this need to be added to the end of the tree rather than being
> positioned like any other rule components?
>
> David Lang
>
>
>
>
>  On Thu, Mar 12, 2015 at 10:55 PM, Rainer Gerhards
>>  wrote:
>>
>>> 2015-03-12 18:16 GMT+01:00 singh.janmejay :
>>>
>>>  On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
  wrote:

> 2015-03-12 16:41 GMT+01:00 David Lang :
>
>  On Thu, 12 Mar 2015, Rainer Gerhards wrote:
>>
>>  2015-03-12 5:55 GMT+01:00 singh.janmejay :
>>
>>>
>>>  On Thu, Mar 12, 2015 at 9:19 AM, David Lang  wrote:
>>>

  On Thu, 12 Mar 2015, singh.janmejay wrote:
>
>  Tried re-ordering it? Put the one with /port first?
>
>>
>>
>
> no, lognorm rules are not supposed to be order dependent, so I
> didn't
> try
> that (especially after finding things failing to parse with rsyslog
>
 that

> worked manually)
>
>
 In case of input strings being matching-rule-wise disjoint, you are
 right, order won't matter. But when they are not disjoint, order
 does
 matter, because the first one to match the string wins.

 Consider this rulebase:
 rule=:%ip:ipv4%%last:rest%
 rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

 If you write it the way I have above, you'll end up matching first
 rule for input 10.20.30.40/5

 But if you write it this way:
 rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
 rule=:%ip:ipv4%%last:rest%

 You'll end up matching the first one.


  This shouldn't happen. The theory is:
>>>
>>> Let i be the current index to be look