Re: [rsyslog] liblognorm vs grok

2016-12-20 Thread mostolog--- via rsyslog

Just created https://github.com/rsyslog/liblognorm/issues/236


El 20/12/16 a las 11:58, mosto...@gmail.com escribió:

El 20/12/16 a las 11:55, Rainer Gerhards escribió:

2016-12-20 11:54 GMT+01:00 mostolog--- via rsyslog:

Must first line be...

"version=2" (v lowercase)

this, 
seehttp://www.liblognorm.com/files/manual/configuration.html#rulebase-versions

Already did, but it's still failing, that's why I'm asking

version=2

rule=:%[
{"type":"alternative","parser":[
{"type":"literal", "text":"a"}
]},
{"type":"literal", "text":"a"}
]%

echo "a" | /usr/lib/lognorm/lognormalizer -r
/etc/rsyslog.d/apps/rb/_a.rb
liblognorm error: rulebase file /etc/rsyslog.d/apps/rb/_a.rb[8]:
invalid record type detected: ']%'
{ "originalmsg": "a", "unparsed-data": "a" }




Rainer


or

"Version=2" (V uppercase)

?

El 14/12/16 a las 10:44,mosto...@gmail.com  escribió:


El 07/12/16 a las 21:00, Rainer Gerhards escribió:

I'm getting /invalid field type 'alternative'/ when using it. Any ideas?

rule=test:%[
{"type":"alternative","parser":[
{"type":"literal","text":"-"},
{"type":"word","name":"identd"}
 ]}
]%

no idea
Did you Set Version=2 in the First line?

Yes.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Followhttps://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.




___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-12-20 Thread Rainer Gerhards
2016-12-20 11:54 GMT+01:00 mostolog--- via rsyslog :
> Must first line be...
>
> "version=2" (v lowercase)

this, see 
http://www.liblognorm.com/files/manual/configuration.html#rulebase-versions

Rainer

>
> or
>
> "Version=2" (V uppercase)
>
> ?
>
> El 14/12/16 a las 10:44, mosto...@gmail.com escribió:
>
>> El 07/12/16 a las 21:00, Rainer Gerhards escribió:
>>>
>>>
 I'm getting /invalid field type 'alternative'/ when using it. Any ideas?

rule=test:%[
{"type":"alternative","parser":[
{"type":"literal","text":"-"},
{"type":"word","name":"identd"}
 ]}
]%
>>>
>>> no idea
>>> Did you Set Version=2 in the First line?
>>
>> Yes.
>>
>
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-12-20 Thread mostolog--- via rsyslog

Must first line be...

"version=2" (v lowercase)

or

"Version=2" (V uppercase)

?

El 14/12/16 a las 10:44, mosto...@gmail.com escribió:

El 07/12/16 a las 21:00, Rainer Gerhards escribió:


I'm getting /invalid field type 'alternative'/ when using it. Any 
ideas?


   rule=test:%[
   {"type":"alternative","parser":[
   {"type":"literal","text":"-"},
   {"type":"word","name":"identd"}
]}
   ]%

no idea
Did you Set Version=2 in the First line?

Yes.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-14 Thread mostolog--- via rsyslog

El 07/12/16 a las 21:00, Rainer Gerhards escribió:



I'm getting /invalid field type 'alternative'/ when using it. Any ideas?

   rule=test:%[
   {"type":"alternative","parser":[
   {"type":"literal","text":"-"},
   {"type":"word","name":"identd"}
]}
   ]%

no idea
Did you Set Version=2 in the First line?

Yes.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread Rainer Gerhards
Sent from phone, thus brief.

Am 07.12.2016 20:10 schrieb "David Lang" :

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:

you either use alternative or you have two different rule lines
>>
> I'm getting /invalid field type 'alternative'/ when using it. Any ideas?
>
>   rule=test:%[
>   {"type":"alternative","parser":[
>   {"type":"literal","text":"-"},
>   {"type":"word","name":"identd"}
>]}
>   ]%
>

no idea


Did you Set Version=2 in the First line?



it would be nice if -v only showed you the part we normally care about,
>> there may be a way to get just this portion, but I don't know how
>>
> I didn't notice any difference between -v, -vv and -vvv, so perhaps it's a
> bug/not implemented/something to ask to @rgerhards
>

I think it is the same. There is always room for improvement, by we need to
prioritize things if we want to get something done. I would love to have
better debugging, but it needs to be written :-( there is also an option in
liblognorm to include the matching parsers in the output, but I think this
is not available in the package.


> this looks like it's undoing things, it may be an artifact of using a
>> custom type (misleading at best)
>>
>> and we've undone averything.
>>
> No idea...does it make sense to declare "longer matching rules" first?
> AKA: combined before common.
>

it really doesn't matter (minor speed difference for putting most commonly
matched rules first, but no difference in parsing accuracy)


Even this depends on the optimization stage.

Rainer



now we look at the second message (it helps understand this if you only
>> look at one at a time, one rule and one log message)
>>
>>   To normalize: '127.0.0.1 - - [17/Mar/2016:18:15:24 +0100] "OPTIONS /
>>>
>> did not find the field useragent, so backing up (probably end-of-line
>> problem)
>>
> It was that, indeed.
>
> Thanks for so long and instructive reply! ;)
>

now you know how to read that debug output, you will find it really helpful
when you just can't see why a rule doesn't match :-)

David Lang

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread David Lang

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:


you either use alternative or you have two different rule lines

I'm getting /invalid field type 'alternative'/ when using it. Any ideas?

  rule=test:%[
  {"type":"alternative","parser":[
  {"type":"literal","text":"-"},
  {"type":"word","name":"identd"}
   ]}
  ]%


no idea

it would be nice if -v only showed you the part we normally care about, 
there may be a way to get just this portion, but I don't know how
I didn't notice any difference between -v, -vv and -vvv, so perhaps it's a 
bug/not implemented/something to ask to @rgerhards


this looks like it's undoing things, it may be an artifact of using a 
custom type (misleading at best)


and we've undone averything.

No idea...does it make sense to declare "longer matching rules" first?
AKA: combined before common.


it really doesn't matter (minor speed difference for putting most commonly 
matched rules first, but no difference in parsing accuracy)


now we look at the second message (it helps understand this if you only 
look at one at a time, one rule and one log message)



  To normalize: '127.0.0.1 - - [17/Mar/2016:18:15:24 +0100] "OPTIONS /
did not find the field useragent, so backing up (probably end-of-line 
problem)

It was that, indeed.

Thanks for so long and instructive reply! ;)


now you know how to read that debug output, you will find it really helpful when 
you just can't see why a rule doesn't match :-)


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com


a literal space is always more efficent than whitespace, only use 
whitespace if there can be more than one space, or tabs

Ok.


just a note, the new syntax is not always better than the old syntax

127.0.0.1 - - [17/Mar/2016:18:15:06 +0100] "GET /redacted HTTP/1.1" 
200 59506


type=@apache_common:%ip:ipv4% %ident:word% %user:word% 
[%date:char-to:]%] "%request:char-to:"%" %response:number% %bytes:rest%

Indeed. switched to old syntax and everything is working...¬¬

   type=@apache_common:%ip:ipv4% %ident:word% %user:word%
   [%date:char-to:]%] "%method:word%%-:whitespace%%request:char-to: %
   HTTP/%httpversion:float%" %response:number% %bytes:word%
   # ] this comment here fixes highlighting
   rule=access_common:%.:@apache_common%
   # .
   rule=access_combined:%.:@apache_common% %referrer:quoted-string%
   %useragent:quoted-string%
   # .


note that bytes really should be type number, but that requires a 
trailiing space right now.
Actually, as sometimes is "-", i must use word, which doesn't seem to 
have issues with SP/LF





  rule=access_combined:%[
   {"type":"@apache_common", "name":"."},
   {"type":"@apache_combined","name":"."}
  ]%


this is looking for one after the other, not either

you either use alternative or you have two different rule lines

I'm getting /invalid field type 'alternative'/ when using it. Any ideas?

   rule=test:%[
   {"type":"alternative","parser":[
   {"type":"literal","text":"-"},
   {"type":"word","name":"identd"}
]}
   ]%


when looking at the trace, everything before the "To normalize:" is 
probably not that useful (it's needed if you think the ruleset isn't 
being parsed correctly, but not to try and figure out why the log line 
isn't being parsed correctly)

Ok

it would be nice if -v only showed you the part we normally care 
about, there may be a way to get just this portion, but I don't know how
I didn't notice any difference between -v, -vv and -vvv, so perhaps it's 
a bug/not implemented/something to ask to @rgerhards


this looks like it's undoing things, it may be an artifact of using a 
custom type (misleading at best)


and we've undone averything.

No idea...does it make sense to declare "longer matching rules" first?
AKA: combined before common.




  normalized: '{ "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:06
  +0100] \"GET \/redacted HTTP\/1.1\" 200 59506", "unparsed-data": "" }'
ok, now I understand this, it parsed the message with @apache_common 
and got to position 77 (the end of the message), but that wasn't the 
end of the rule, so the parsing failed, and it failed with nothing 
left to parse

Understood. Hope it won't happen again.

now we look at the second message (it helps understand this if you 
only look at one at a time, one rule and one log message)



  To normalize: '127.0.0.1 - - [17/Mar/2016:18:15:24 +0100] "OPTIONS /
did not find the field useragent, so backing up (probably end-of-line 
problem)

It was that, indeed.

Thanks for so long and instructive reply! ;)
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread David Lang

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:


{"type":"@apache" name="."} ?


actuall, %{"type":"@apache" name="."}%

This is one of the places where I like to use the older, more compact 
syntax :-)
Older/Compact doesn't seem to have an alternative, reason why I started using 
JSON syntax...right?


but just because you use json for one section, it doesn't mean you have to use 
json for everything.


use what makes the most sense for the portion you are working on.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread David Lang

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:

I'm still trying to reproduce/understand what is happening and building a 
test case for the github issue if needed.


Consider the following HTTP access lines:

  127.0.0.1 - - [17/Mar/2016:18:15:06 +0100] "GET /redacted HTTP/1.1"
  200 59506
  127.0.0.1 - - [17/Mar/2016:18:15:24 +0100] "OPTIONS / HTTP/1.1" 403
  205 "-" "-"

And the following rule:

  # This is just access_log. Perhaps literal is more efficient than
  whitespace?


a literal space is always more efficent than whitespace, only use whitespace if 
there can be more than one space, or tabs



  type=@apache_common:%[
   {"type":"ipv4", "name":"ip"},
   {"type":"whitespace"},
   {"type":"word", "name":"ident"},
   {"type":"whitespace"},
   {"type":"word", "name":"user"},
   {"type":"literal", "text":" ["},
   {"type":"char-to", "name":"date", "extradata":"]"},
   {"type":"literal", "text":"] \""},
   {"type":"word", "name":"method"},
   {"type":"whitespace"},
   {"type":"char-to", "name":"request", "extradata":" "},
   {"type":"literal", "text":" HTTP/"},
   {"type":"float", "name":"httpversion"},
   {"type":"literal", "text":"\""},
   {"type":"whitespace"},
   {"type":"number", "name":"response"},
   {"type":"whitespace"},
   {"type":"word", "name":"bytes"}
  ]%


just a note, the new syntax is not always better than the old syntax

127.0.0.1 - - [17/Mar/2016:18:15:06 +0100] "GET /redacted HTTP/1.1" 200 59506

type=@apache_common:%ip:ipv4% %ident:word% %user:word% [%date:char-to:]%] 
"%request:char-to:"%" %response:number% %bytes:rest%


note that bytes really should be type number, but that requires a trailiing 
space right now.




  #AFAIK this should accept null or apache combined log fields
  type=@apache_combined:-
  type=@apache_combined:%[
   {"type":"whitespace"},
   {"type":"quoted-string","name":"referrer"},
   {"type":"whitespace"},
   {"type":"quoted-string","name":"useragent"}
  ]%
  rule=access_combined:%[
   {"type":"@apache_common", "name":"."}
  ]%

*As expected*, the first line matches and the second doesn't:

  { "bytes": "59506", "response": "200", "httpversion": "1.1",
  "request": "\/redacted", "method": "GET", "date":
  "17\/Mar\/2016:18:15:06 +0100", "user": "-", "ident": "-", "ip":
  "127.0.0.1" }
  { "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:24 +0100]
  \"OPTIONS \/ HTTP\/1.1\" 403 205 \"-\" \"-", "unparsed-data": "
  \"-\" \"-" }

But if we try:

  rule=access_combined:%[
   {"type":"@apache_common", "name":"."},
   {"type":"@apache_combined","name":"."}
  ]%


this is looking for one after the other, not either

you either use alternative or you have two different rule lines


Doesnt parse any of them properly, and we're getting:

  { "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:06 +0100] \"GET
  \/redacted HTTP\/1.1\" 200 59506", "unparsed-data": "" }
  { "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:24 +0100]
  \"OPTIONS \/ HTTP\/1.1\" 403 205 \"-\" \"-", "unparsed-data": "
  \"-\" \"-" }


Here's trace:




when looking at the trace, everything before the "To normalize:" is probably not 
that useful (it's needed if you think the ruleset isn't being parsed correctly, 
but not to try and figure out why the log line isn't being parsed correctly)


it would be nice if -v only showed you the part we normally care about, there 
may be a way to get just this portion, but I don't know how




  To normalize: '127.0.0.1 - - [17/Mar/2016:18:15:06 +0100] "GET
  /redacted HTTP/1.1" 200 59506'
  liblognorm: 0: enter parser, dag node 0x7f97c606d0a0, json
  0x7f97c6071590
  liblognorm: 0/0:trying 'USER-DEFINED' parser for field '.', data
  'UNKNOWN'
  liblognorm: calling custom parser '@apache_common'
  liblognorm: 0: enter parser, dag node 0x7f97c606e650, json
  0x7f97c60700f0
  liblognorm: 0/1:trying 'ipv4' parser for field 'ip', data 'UNKNOWN'
  liblognorm: parser lookup returns 0, pParsed 9
  liblognorm: 0: potential hit, trying subtree 0x7f97c606f050


found the ipv4 (9 bytes, now at position 9)


  liblognorm: 9: enter parser, dag node 0x7f97c606f050, json
  0x7f97c60700f0
  liblognorm: 9/1:trying 'whitespace' parser for field '(null)', data
  'UNKNOWN'
  liblognorm: parser lookup returns 0, pParsed 1
  liblognorm: 9: potential hit, trying subtree 0x7f97c606f180


found whitespace (1 byte, now at position 10)


  liblognorm: 10: enter parser, dag node 0x7f97c606f180, json
  0x7f97c60700f0
  liblognorm: 10/1:trying 'word' parser for field 'ident', data 'UNKNOWN'
  liblognorm: parser lookup returns 0, pParsed 1
  liblognorm: 10: potential hit, trying subtree 0x7f97c606f3e0


found ident (1 byte, now at position 11)


  liblognorm: 11: enter parser, dag node 0x7f97c606f3e0, json
  0x7f97c60700f0
  liblognorm: 11/1:trying 'whitespace' parser for field '(null)', data
  'UNKNOWN'
  liblognorm: parser lookup returns 0, pParsed 1
  liblognorm: 11: potential 

Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com



that is the same type of bug, just for another type.

just add a note that we need to allow end of line for all types, it's 
not limited to space.
I'm missing code commenting...probably I'm going to switch back to ~doc 
tasks :P


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com



{"type":"@apache" name="."} ?


actuall, %{"type":"@apache" name="."}%

This is one of the places where I like to use the older, more compact 
syntax :-)
Older/Compact doesn't seem to have an alternative, reason why I started 
using JSON syntax...right?


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com
I'm still trying to reproduce/understand what is happening and building 
a test case for the github issue if needed.


Consider the following HTTP access lines:

   127.0.0.1 - - [17/Mar/2016:18:15:06 +0100] "GET /redacted HTTP/1.1"
   200 59506
   127.0.0.1 - - [17/Mar/2016:18:15:24 +0100] "OPTIONS / HTTP/1.1" 403
   205 "-" "-"

And the following rule:

   # This is just access_log. Perhaps literal is more efficient than
   whitespace?
   type=@apache_common:%[
{"type":"ipv4", "name":"ip"},
{"type":"whitespace"},
{"type":"word", "name":"ident"},
{"type":"whitespace"},
{"type":"word", "name":"user"},
{"type":"literal", "text":" ["},
{"type":"char-to", "name":"date", "extradata":"]"},
{"type":"literal", "text":"] \""},
{"type":"word", "name":"method"},
{"type":"whitespace"},
{"type":"char-to", "name":"request", "extradata":" "},
{"type":"literal", "text":" HTTP/"},
{"type":"float", "name":"httpversion"},
{"type":"literal", "text":"\""},
{"type":"whitespace"},
{"type":"number", "name":"response"},
{"type":"whitespace"},
{"type":"word", "name":"bytes"}
   ]%

   #AFAIK this should accept null or apache combined log fields
   type=@apache_combined:-
   type=@apache_combined:%[
{"type":"whitespace"},
{"type":"quoted-string","name":"referrer"},
{"type":"whitespace"},
{"type":"quoted-string","name":"useragent"}
   ]%
   rule=access_combined:%[
{"type":"@apache_common", "name":"."}
   ]%

*As expected*, the first line matches and the second doesn't:

   { "bytes": "59506", "response": "200", "httpversion": "1.1",
   "request": "\/redacted", "method": "GET", "date":
   "17\/Mar\/2016:18:15:06 +0100", "user": "-", "ident": "-", "ip":
   "127.0.0.1" }
   { "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:24 +0100]
   \"OPTIONS \/ HTTP\/1.1\" 403 205 \"-\" \"-", "unparsed-data": "
   \"-\" \"-" }

But if we try:

   rule=access_combined:%[
{"type":"@apache_common", "name":"."},
{"type":"@apache_combined","name":"."}
   ]%

Doesnt parse any of them properly, and we're getting:

   { "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:06 +0100] \"GET
   \/redacted HTTP\/1.1\" 200 59506", "unparsed-data": "" }
   { "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:24 +0100]
   \"OPTIONS \/ HTTP\/1.1\" 403 205 \"-\" \"-", "unparsed-data": "
   \"-\" \"-" }


Here's trace:

   liblognorm: loading rulebase file '/test/apps/10-apache.rb'
   liblognorm: rulebase version is 2

   liblognorm: read rulebase line[~25]: 'type=@apache_common:%[
   {"type":"ipv4", "name":"ip"},{"type":"whitespace"},
   {"type":"word", "name":"ident"},{"type":"whitespace"},
   {"type":"word", "name":"user"},{"type":"literal", "text":"
   ["},{"type":"char-to", "name":"date", "extradata":"]"},
   {"type":"literal", "text":"] \""},{"type":"word",
   "name":"method"},{"type":"whitespace"}, {"type":"char-to",
   "name":"request", "extradata":" "}, {"type":"literal", "text":"
   HTTP/"},{"type":"float", "name":"httpversion"},   
   {"type":"literal", "text":"\""}, {"type":"whitespace"},   
   {"type":"number", "name":"response"},{"type":"whitespace"},   
   {"type":"word", "name":"bytes"}]%'

   liblognorm: type line to add: '@apache_common:%[{"type":"ipv4",
   "name":"ip"},{"type":"whitespace"},{"type":"word",
   "name":"ident"},{"type":"whitespace"},{"type":"word",
   "name":"user"},{"type":"literal", "text":" ["},
   {"type":"char-to", "name":"date", "extradata":"]"},
   {"type":"literal", "text":"] \""},{"type":"word",
   "name":"method"},{"type":"whitespace"}, {"type":"char-to",
   "name":"request", "extradata":" "}, {"type":"literal", "text":"
   HTTP/"},{"type":"float", "name":"httpversion"},   
   {"type":"literal", "text":"\""}, {"type":"whitespace"},   
   {"type":"number", "name":"response"},{"type":"whitespace"},   
   {"type":"word", "name":"bytes"}]%'

   liblognorm: type name is '@apache_common'
   liblognorm: type line to add: '%[{"type":"ipv4", "name":"ip"},
   {"type":"whitespace"},{"type":"word", "name":"ident"},
   {"type":"whitespace"},{"type":"word", "name":"user"},
   {"type":"literal", "text":" ["},{"type":"char-to",
   "name":"date", "extradata":"]"},{"type":"literal", "text":"]
   \""},{"type":"word", "name":"method"}, {"type":"whitespace"},   
   {"type":"char-to", "name":"request", "extradata":" "},   
   {"type":"literal", "text":" HTTP/"}, {"type":"float",
   "name":"httpversion"},{"type":"literal", "text":"\""},   
   {"type":"whitespace"},{"type":"number", "name":"response"},   
   {"type":"whitespace"},{"type":"word", "name":"bytes"}]%'

   liblognorm: ln_pdagFindType, name '@apache_common', bAdd: 1, nTypes 0
   liblognorm: custom type '@apache_common' does not yet exist, adding...
   liblognorm: addSampToTree 0 of 

Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread David Lang

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:

almost, %@apache% makes no more sense than %word%, you need to give the 
match a name


so %log:@apache% would work, or if you want to move everything up a later 
(rather than having $!apache!ip) you could do %.:@apache%

That should work

How would that be using JSON syntax?
{"type":"@apache" name="."} ?


actuall, %{"type":"@apache" name="."}%

This is one of the places where I like to use the older, more compact syntax :-)

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread David Lang

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:



I think it's a problem, several of the types require a space at the end,
and
I think they should all be modified to allow either a space or a
end-of-line.

ack. It's on my list for early next year.

better check if one exists, I also think David created one. This is
for the liblognorm project.
According to https://github.com/rsyslog/liblognorm/issues/207 the problem 
could be:

https://github.com/rsyslog/liblognorm/blob/master/src/parser.c#L2869
am I right? Should it accept SP and (\n) LF? If that's all I could PR...


that is the same type of bug, just for another type.

just add a note that we need to allow end of line for all types, it's not 
limited to space.


David Lamg
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com




I think it's a problem, several of the types require a space at the end,
and
I think they should all be modified to allow either a space or a
end-of-line.

ack. It's on my list for early next year.

better check if one exists, I also think David created one. This is
for the liblognorm project.
According to https://github.com/rsyslog/liblognorm/issues/207 the 
problem could be:

https://github.com/rsyslog/liblognorm/blob/master/src/parser.c#L2869
am I right? Should it accept SP and (\n) LF? If that's all I could PR...





* A or B (doc states it does)
* A or nothing (that was my real question)


I'm not sure if you can have a blank item on one branch or not. If not,
can
you have the branches both include a required item? (either the one
before
or the one after)

I *think* (but do not know for sure) this might work. Else I'll add
early next year as well. Conceptionally, it really is alternative with
a void branch.


Once I get an "alternative" working, I'll try to have an empty branch. If it
doesn't work, I'll open an issue :)

I think there also already one exists, from Radu.

Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com


when troubleshooting things like this, create a rule file that is as 
minimal as you can get and parse with the -v option, it will show you 
what it's doing as it walks through the line.


I don't see how it parsed each message. Perhaps a debug option must be 
enabled?


   number of tree nodes: 20
   liblognorm: COMPONENT: @apache
   liblognorm: subDAG 0x7f97bae1a650 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'ipv4', name 'ip': 'UNKNOWN': called 0
   liblognorm: field type 'ipv4', name 'ip': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1b050 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN': called 0
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1b180 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'word', name 'ident': 'UNKNOWN': called 0
   liblognorm: field type 'word', name 'ident': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1b3e0 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN': called 0
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1b610 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'word', name 'user': 'UNKNOWN': called 0
   liblognorm: field type 'word', name 'user': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1b780 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'literal', name '(null)': ' [': called 0
   liblognorm: field type 'literal', name '(null)': ' [':
   liblognorm: subDAG 0x7f97bae1b820 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'char-to', name 'date': 'UNKNOWN': called 0
   liblognorm: field type 'char-to', name 'date': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1bc30 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'literal', name '(null)': '] "': called 0
   liblognorm: field type 'literal', name '(null)': '] "':
   liblognorm: subDAG 0x7f97bae1bdc0 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'word', name 'method': 'UNKNOWN': called 0
   liblognorm: field type 'word', name 'method': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1c050 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN': called 0
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1c3c0 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'char-to', name 'request': 'UNKNOWN': called 0
   liblognorm: field type 'char-to', name 'request': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1c530 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'literal', name '(null)': ' HTTP/': called 0
   liblognorm: field type 'literal', name '(null)': ' HTTP/':
   liblognorm: subDAG 0x7f97bae1cbd0 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'float', name 'httpversion': 'UNKNOWN': called 0
   liblognorm: field type 'float', name 'httpversion': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1cd50 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'literal', name '(null)': '"': called 0
   liblognorm: field type 'literal', name '(null)': '"':
   liblognorm: subDAG 0x7f97bae1cf90 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN': called 0
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1d200 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'number', name 'response': 'UNKNOWN': called 0
   liblognorm: field type 'number', name 'response': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1d350 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN': called 0
   liblognorm: field type 'whitespace', name '(null)': 'UNKNOWN':
   liblognorm: subDAG 0x7f97bae1d6e0 (children: 1 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: field type 'word', name 'bytes': 'UNKNOWN': called 0
   liblognorm: field type 'word', name 'bytes': 'UNKNOWN':
   liblognorm: subDAG [TERM] 0x7f97bae1da80 (children: 0 parsers, ref
   1) [called 0, backtracked 0]
   liblognorm: MAIN COMPONENT:
   liblognorm: subDAG 0x7f97bae190a0 (children: 0 parsers, ref 1)
   [called 0, backtracked 0]
   liblognorm: MAIN COMPONENT (alternative):
   liblognorm: 0x7f97bae190a0[ref 1]:
   To normalize: '127.0.0.1 - - [17/Mar/2016:18:06:58 +0100] "GET
   /redacted HTTP/1.1" 200 62957'
   liblognorm: 0: enter parser, dag node 0x7f97bae190a0, json
   0x7f97bae1ba20
   liblognorm: offs 0, strLen 102, isTerm 0
   liblognorm: 0 returns 

Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com




almost, %@apache% makes no more sense than %word%, you need to give 
the match a name


so %log:@apache% would work, or if you want to move everything up a 
later (rather than having $!apache!ip) you could do %.:@apache%

That should work

How would that be using JSON syntax?
{"type":"@apache" name="."} ?


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com

Should something like this work?

{"type":"@apache"},
{"type":"alternative","parser":[
{},
{
{"type":"whitespace"},
...
}
]}


El 07/12/16 a las 11:08, Rainer Gerhards escribió:

2016-12-07 10:38 GMT+01:00 mosto...@gmail.com :



In this case, I seem to remember that number is defined as being
followed
by a space, so you can't use it if the number is followed by a newline.

I'll have to confirm that...but may I know why? Should I fill an issue
if
it's indeed that way?

I think it's a problem, several of the types require a space at the end,
and
I think they should all be modified to allow either a space or a
end-of-line.

ack. It's on my list for early next year.

May I create an issue somewhere?

better check if one exists, I also think David created one. This is
for the liblognorm project.


* A or B (doc states it does)
* A or nothing (that was my real question)

I'm not sure if you can have a blank item on one branch or not. If not,
can
you have the branches both include a required item? (either the one
before
or the one after)

I *think* (but do not know for sure) this might work. Else I'll add
early next year as well. Conceptionally, it really is alternative with
a void branch.

Once I get an "alternative" working, I'll try to have an empty branch. If it
doesn't work, I'll open an issue :)

I think there also already one exists, from Radu.

Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread Rainer Gerhards
2016-12-07 10:38 GMT+01:00 mosto...@gmail.com :
>
> In this case, I seem to remember that number is defined as being
> followed
> by a space, so you can't use it if the number is followed by a newline.

 I'll have to confirm that...but may I know why? Should I fill an issue
 if
 it's indeed that way?
>>>
>>>
>>> I think it's a problem, several of the types require a space at the end,
>>> and
>>> I think they should all be modified to allow either a space or a
>>> end-of-line.
>>
>> ack. It's on my list for early next year.
>
> May I create an issue somewhere?

better check if one exists, I also think David created one. This is
for the liblognorm project.

>
>>
>>>
 * A or B (doc states it does)
 * A or nothing (that was my real question)
>>>
>>>
>>> I'm not sure if you can have a blank item on one branch or not. If not,
>>> can
>>> you have the branches both include a required item? (either the one
>>> before
>>> or the one after)
>>
>> I *think* (but do not know for sure) this might work. Else I'll add
>> early next year as well. Conceptionally, it really is alternative with
>> a void branch.
>
>
> Once I get an "alternative" working, I'll try to have an empty branch. If it
> doesn't work, I'll open an issue :)

I think there also already one exists, from Radu.

Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com



In this case, I seem to remember that number is defined as being followed
by a space, so you can't use it if the number is followed by a newline.

I'll have to confirm that...but may I know why? Should I fill an issue if
it's indeed that way?


I think it's a problem, several of the types require a space at the end, and
I think they should all be modified to allow either a space or a
end-of-line.

ack. It's on my list for early next year.

May I create an issue somewhere?






* A or B (doc states it does)
* A or nothing (that was my real question)


I'm not sure if you can have a blank item on one branch or not. If not, can
you have the branches both include a required item? (either the one before
or the one after)

I *think* (but do not know for sure) this might work. Else I'll add
early next year as well. Conceptionally, it really is alternative with
a void branch.


Once I get an "alternative" working, I'll try to have an empty branch. 
If it doesn't work, I'll open an issue :)


Thank you all!
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread Rainer Gerhards
2016-12-07 9:11 GMT+01:00 David Lang :
> On Wed, 7 Dec 2016, mosto...@gmail.com wrote:
>
>>> when troubleshooting things like this, create a rule file that is as
>>> minimal as you can get and parse with the -v option, it will show you what
>>> it's doing as it walks through the line.
>>
>> Ok :)
>>
>>>
>>> In this case, I seem to remember that number is defined as being followed
>>> by a space, so you can't use it if the number is followed by a newline.
>>
>> I'll have to confirm that...but may I know why? Should I fill an issue if
>> it's indeed that way?
>
>
> I think it's a problem, several of the types require a space at the end, and
> I think they should all be modified to allow either a space or a
> end-of-line.

ack. It's on my list for early next year.

>
> I don't remember if I've opened an issue for this or not.
>
>>> there is the alternative capability in the v2 language, or define
>>> multiple rules
>>
>> Multiple rules is what I'm trying now. I have tested alternate and I'm not
>> able to get it working.
>> Does alternative work for both...?
>>
>> * A or B (doc states it does)
>> * A or nothing (that was my real question)
>
>
> I'm not sure if you can have a blank item on one branch or not. If not, can
> you have the branches both include a required item? (either the one before
> or the one after)

I *think* (but do not know for sure) this might work. Else I'll add
early next year as well. Conceptionally, it really is alternative with
a void branch.

Raienr
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread David Lang

On Wed, 7 Dec 2016, mosto...@gmail.com wrote:

when troubleshooting things like this, create a rule file that is as 
minimal as you can get and parse with the -v option, it will show you what 
it's doing as it walks through the line.

Ok :)



In this case, I seem to remember that number is defined as being followed 
by a space, so you can't use it if the number is followed by a newline.
I'll have to confirm that...but may I know why? Should I fill an issue if 
it's indeed that way?


I think it's a problem, several of the types require a space at the end, and I 
think they should all be modified to allow either a space or a end-of-line.


I don't remember if I've opened an issue for this or not.

there is the alternative capability in the v2 language, or define multiple 
rules
Multiple rules is what I'm trying now. I have tested alternate and I'm not 
able to get it working.

Does alternative work for both...?

* A or B (doc states it does)
* A or nothing (that was my real question)


I'm not sure if you can have a blank item on one branch or not. If not, can you 
have the branches both include a required item? (either the one before or the 
one after)


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-07 Thread mosto...@gmail.com


when troubleshooting things like this, create a rule file that is as 
minimal as you can get and parse with the -v option, it will show you 
what it's doing as it walks through the line.

Ok :)



In this case, I seem to remember that number is defined as being 
followed by a space, so you can't use it if the number is followed by 
a newline.
I'll have to confirm that...but may I know why? Should I fill an issue 
if it's indeed that way?


almost, %@apache% makes no more sense than %word%, you need to give 
the match a name


so %log:@apache% would work, or if you want to move everything up a 
later (rather than having $!apache!ip) you could do %.:@apache%

That should work

there is the alternative capability in the v2 language, or define 
multiple rules
Multiple rules is what I'm trying now. I have tested alternate and I'm 
not able to get it working.

Does alternative work for both...?

 * A or B (doc states it does)
 * A or nothing (that was my real question)

Regards
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread David Lang

On Mon, 5 Dec 2016, mosto...@gmail.com wrote:


I forgot:

With provided rule file...why I'm getting a bunch of this errors when 
using /usr/lib/lognorm/lognormalizer?


{ "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:31 +0100] \"GET 
\/redacted\/page HTTP\/1.1\" 200 1234", "unparsed-data": "" }


when troubleshooting things like this, create a rule file that is as minimal as 
you can get and parse with the -v option, it will show you what it's doing as it 
walks through the line.


In this case, I seem to remember that number is defined as being followed by a 
space, so you can't use it if the number is followed by a newline.


David Lang



El 05/12/16 a las 15:41, mosto...@gmail.com escribió:

Hi


Coming back to liblognorm, I have a few questions I'll love an expert 
reply.  0:D


*- Documentation [1] states how to define a type, but not how to use 
it. Are we properly using defined type "apache" in the configuration 
below?*


- Apache access log seem to have 2 formats: common and combined [2]
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326
127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326 "referrer" "useragent"

*How should we define our rulesets to have /optional/ fields?

- Our current workaround is to have a defined type and use it as part 
of a longer rule. Is that ok?*


*- How could we define logic to set a filed to "0" when content is 
"-"? (bytes field)*


type=@apache:%[
{"type":"ipv4", "name":"ip"},
{"type":"literal", "text":" "},
{"type":"word", "name":"ident"},
{"type":"literal", "text":" "},
{"type":"word", "name":"user"},
{"type":"literal", "text":" ["},
{"type":"char-to", "name":"date", "extradata":"]"},
{"type":"literal", "text":"] \""},
{"type":"word", "name":"method"},
{"type":"literal", "text":" "},
{"type":"char-to", "name":"request", "extradata":" "},
{"type":"literal", "text":" HTTP/"},
{"type":"float", "name":"httpversion"},
{"type":"literal", "text":"\" "},
{"type":"number", "name":"response"},
{"type":"literal", "text":" "},
{"type":"number", "name":"bytes"}
]%

rule=access:%[
{"type":"@apache"},
{"type":"literal", "text":"\""},
{"type":"char-to", "name":"referrer", "extradata":"\""},
{"type":"literal", "text":"\""},
{"type":"char-to", "name":"useragent", "extradata":"\""}
]%
rule=access:%@apache%

[1] http://www.liblognorm.com/files/manual/configuration.html
[2] https://httpd.apache.org/docs/2.4/logs.html#accesslog


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
LIKE THAT.___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread David Lang

On Mon, 5 Dec 2016, mosto...@gmail.com wrote:


Hi


Coming back to liblognorm, I have a few questions I'll love an expert reply. 
0:D


*- Documentation [1] states how to define a type, but not how to use it. Are 
we properly using defined type "apache" in the configuration below?*


almost, %@apache% makes no more sense than %word%, you need to give the match a 
name


so %log:@apache% would work, or if you want to move everything up a later 
(rather than having $!apache!ip) you could do %.:@apache%



- Apache access log seem to have 2 formats: common and combined [2]
   127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326
   127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 
200 2326 "referrer" "useragent"

*How should we define our rulesets to have /optional/ fields?


there is the alturnative capability in the v2 language, or define multiple rules

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread mosto...@gmail.com

I forgot:

With provided rule file...why I'm getting a bunch of this errors when 
using /usr/lib/lognorm/lognormalizer?


{ "originalmsg": "127.0.0.1 - - [17\/Mar\/2016:18:15:31 +0100] \"GET 
\/redacted\/page HTTP\/1.1\" 200 1234", "unparsed-data": "" }



El 05/12/16 a las 15:41, mosto...@gmail.com escribió:

Hi


Coming back to liblognorm, I have a few questions I'll love an expert 
reply.  0:D


*- Documentation [1] states how to define a type, but not how to use 
it. Are we properly using defined type "apache" in the configuration 
below?*


- Apache access log seem to have 2 formats: common and combined [2]
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326
127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326 "referrer" "useragent"

*How should we define our rulesets to have /optional/ fields?

- Our current workaround is to have a defined type and use it as part 
of a longer rule. Is that ok?*


*- How could we define logic to set a filed to "0" when content is 
"-"? (bytes field)*


type=@apache:%[
{"type":"ipv4", "name":"ip"},
{"type":"literal", "text":" "},
{"type":"word", "name":"ident"},
{"type":"literal", "text":" "},
{"type":"word", "name":"user"},
{"type":"literal", "text":" ["},
{"type":"char-to", "name":"date", "extradata":"]"},
{"type":"literal", "text":"] \""},
{"type":"word", "name":"method"},
{"type":"literal", "text":" "},
{"type":"char-to", "name":"request", "extradata":" "},
{"type":"literal", "text":" HTTP/"},
{"type":"float", "name":"httpversion"},
{"type":"literal", "text":"\" "},
{"type":"number", "name":"response"},
{"type":"literal", "text":" "},
{"type":"number", "name":"bytes"}
]%

rule=access:%[
{"type":"@apache"},
{"type":"literal", "text":"\""},
{"type":"char-to", "name":"referrer", "extradata":"\""},
{"type":"literal", "text":"\""},
{"type":"char-to", "name":"useragent", "extradata":"\""}
]%
rule=access:%@apache%

[1] http://www.liblognorm.com/files/manual/configuration.html
[2] https://httpd.apache.org/docs/2.4/logs.html#accesslog


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread mosto...@gmail.com

Hi


Coming back to liblognorm, I have a few questions I'll love an expert 
reply.  0:D


*- Documentation [1] states how to define a type, but not how to use it. 
Are we properly using defined type "apache" in the configuration below?*


- Apache access log seem to have 2 formats: common and combined [2]
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326
127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif 
HTTP/1.0" 200 2326 "referrer" "useragent"

*How should we define our rulesets to have /optional/ fields?

- Our current workaround is to have a defined type and use it as part of 
a longer rule. Is that ok?*


*- How could we define logic to set a filed to "0" when content is "-"? 
(bytes field)*


type=@apache:%[
{"type":"ipv4", "name":"ip"},
{"type":"literal", "text":" "},
{"type":"word", "name":"ident"},
{"type":"literal", "text":" "},
{"type":"word", "name":"user"},
{"type":"literal", "text":" ["},
{"type":"char-to", "name":"date", "extradata":"]"},
{"type":"literal", "text":"] \""},
{"type":"word", "name":"method"},
{"type":"literal", "text":" "},
{"type":"char-to", "name":"request", "extradata":" "},
{"type":"literal", "text":" HTTP/"},
{"type":"float", "name":"httpversion"},
{"type":"literal", "text":"\" "},
{"type":"number", "name":"response"},
{"type":"literal", "text":" "},
{"type":"number", "name":"bytes"}
]%

rule=access:%[
{"type":"@apache"},
{"type":"literal", "text":"\""},
{"type":"char-to", "name":"referrer", "extradata":"\""},
{"type":"literal", "text":"\""},
{"type":"char-to", "name":"useragent", "extradata":"\""}
]%
rule=access:%@apache%

[1] http://www.liblognorm.com/files/manual/configuration.html
[2] https://httpd.apache.org/docs/2.4/logs.html#accesslog
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread mosto...@gmail.com

Is that documentation stored on a github like rsyslog's?

http://www.liblognorm.com/files/manual/index.html


El 05/12/16 a las 11:15, David Lang escribió:

On Mon, 5 Dec 2016, mosto...@gmail.com wrote:


Hi.

Is there an online liblognorm tester to check the rules we are writing?

Otherwise, could you provide a testing guide 
(http://www.liblognorm.com/files/manual/installation.html#testing) to 
build lognormalizer to test?


the liblognorm package includes lognormalizer, but it doesn't put it 
in a place where it's picked up by the default path


/usr/lib/lognorm/lognormalizer




El 04/10/16 a las 19:27, mosto...@gmail.com escribió:

Hi Radu


After reading 
http://lists.adiscon.net/pipermail/rsyslog/2013-December/035122.html 
and considering several years have passed, I would like to get some 
feedback of your experience, to help me choose between raw 
forwarding messages+logstash or split before forwarding with 
mmnormalize.


If this decision should've been made today, what you would've choosen?

Of course, everybody is welcome to join the thread.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a 
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT 
POST if you DON'T LIKE THAT.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread David Lang

On Mon, 5 Dec 2016, mosto...@gmail.com wrote:


Hi.

Is there an online liblognorm tester to check the rules we are writing?

Otherwise, could you provide a testing guide 
(http://www.liblognorm.com/files/manual/installation.html#testing) to 
build lognormalizer to test?


the liblognorm package includes lognormalizer, but it doesn't put it in a place 
where it's picked up by the default path


/usr/lib/lognorm/lognormalizer




El 04/10/16 a las 19:27, mosto...@gmail.com escribió:

Hi Radu


After reading 
http://lists.adiscon.net/pipermail/rsyslog/2013-December/035122.html 
and considering several years have passed, I would like to get some 
feedback of your experience, to help me choose between raw forwarding 
messages+logstash or split before forwarding with mmnormalize.


If this decision should've been made today, what you would've choosen?

Of course, everybody is welcome to join the thread.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
LIKE THAT.___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-12-05 Thread mosto...@gmail.com

Hi.

Is there an online liblognorm tester to check the rules we are writing?

Otherwise, could you provide a testing guide 
(http://www.liblognorm.com/files/manual/installation.html#testing) to 
build lognormalizer to test?



El 04/10/16 a las 19:27, mosto...@gmail.com escribió:

Hi Radu


After reading 
http://lists.adiscon.net/pipermail/rsyslog/2013-December/035122.html 
and considering several years have passed, I would like to get some 
feedback of your experience, to help me choose between raw forwarding 
messages+logstash or split before forwarding with mmnormalize.


If this decision should've been made today, what you would've choosen?

Of course, everybody is welcome to join the thread.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-10-27 Thread David Lang

On Sat, 8 Oct 2016, Radu Gheorghe wrote:


That's right, it's not so much about problems as
conveniency/flexibility. For example, with grok.regex you can specify
optional fields right in the middle of the pattern. With
liblognorm/mmnormalize I have to repeat that rule with and without
that field. If you have 5 of those... you get quite a combinatorial
explosion.

Maybe this particular one is already possible with liblognorm v2? But
anyway, this is just an example. Though I'm looking forward to work
[more] with v2 because it seems much more flexible than v1 indeed.


Yep, one of the things you can specify in the v2 language is alturnatives, and 
you can do this either inside a single rule or as a type that you then use in a 
rule.


v2 is a massive improvement in terms of flexibility compared to v1

David Lang


--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Oct 7, 2016 at 9:56 AM, Rainer Gerhards
 wrote:

Not speaking for Radu, buy I think he does not have problems, but Grok
rules seem more convenient, and often that's really what they are. That's
where custom types come in: if you have a good base set, then it really is
not much difference in convenience. Unfortunately we don't have this yet.

Rainer

Sent from phone, thus brief.

Am 07.10.2016 18:53 schrieb "Joe Blow" :


Hey Radu,

Long time listener, first time caller :).  What did you have problems with
mmnormalize?

Cheers,

JB

On Fri, Oct 7, 2016 at 12:43 PM, Rainer Gerhards 

wrote:

Hi Radu


After reading
http://lists.adiscon.net/pipermail/rsyslog/2013-December/035122.html

and

considering several years have passed, I would like to get some

feedback

of

your experience, to help me choose between raw forwarding

messages+logstash

or split before forwarding with mmnormalize.

If this decision should've been made today, what you would've

choosen?


Of course, everybody is welcome to join the thread.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a

myriad

of

sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you

DON'T

LIKE THAT.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a

myriad

of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE 

Re: [rsyslog] liblognorm vs grok

2016-10-19 Thread Brian Knox
Getting some ideas from reading this. Thank you!

On Tue, Oct 18, 2016 at 3:22 AM Radu Gheorghe 
wrote:

> It look very very very very nice, Rainer! Thanks for publishing!
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Oct 17, 2016 at 4:53 PM, Rainer Gerhards
>  wrote:
> > It took a while, but finally the thesis is online:
> >
> >
> https://www.fernuni-hagen.de/imperia/md/content/rechnerarchitektur/rainer_gerhards.pdf
> >
> > Rainer
> >
> > 2016-10-06 11:32 GMT+02:00 Rainer Gerhards :
> >> 2016-10-06 11:23 GMT+02:00 mosto...@gmail.com :
> >>>
> >>>
> > Totally agree...(actually, liblognorm is giving me segfaults :P)
> 
>  I'll try to check next week when my current task is done.
> >>>
> >>> I know you're busy...trying to contribute as much as I can with
> everything
> >>> I deal with on my daily work.
> >>>
> 
>  Liblognorm is based on work from my MSc Thesis. The thesis paper is
>  currently being processed for upload, I expect it to be available next
>  week. If you'd like to dig down to the details and an explanation why
> it
>  is
>  faster, the thesis will have it in great detail. I can post a link
> once
>  it
>  is online.
> >>>
> >>> Cool
> >>> I guess it should be similar to what a firewall does when it "compiles"
> >>> the rules.
> >>
> >>
> >> Ah, not really, as here we have text detection, which is a different
> story
> >> for a firewall...
> >>
> >> Rainer
> > ___
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-10-17 Thread Rainer Gerhards
It took a while, but finally the thesis is online:

https://www.fernuni-hagen.de/imperia/md/content/rechnerarchitektur/rainer_gerhards.pdf

Rainer

2016-10-06 11:32 GMT+02:00 Rainer Gerhards :
> 2016-10-06 11:23 GMT+02:00 mosto...@gmail.com :
>>
>>
 Totally agree...(actually, liblognorm is giving me segfaults :P)
>>>
>>> I'll try to check next week when my current task is done.
>>
>> I know you're busy...trying to contribute as much as I can with everything
>> I deal with on my daily work.
>>
>>>
>>> Liblognorm is based on work from my MSc Thesis. The thesis paper is
>>> currently being processed for upload, I expect it to be available next
>>> week. If you'd like to dig down to the details and an explanation why it
>>> is
>>> faster, the thesis will have it in great detail. I can post a link once
>>> it
>>> is online.
>>
>> Cool
>> I guess it should be similar to what a firewall does when it "compiles"
>> the rules.
>
>
> Ah, not really, as here we have text detection, which is a different story
> for a firewall...
>
> Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-10-07 Thread Rainer Gerhards
Not speaking for Radu, buy I think he does not have problems, but Grok
rules seem more convenient, and often that's really what they are. That's
where custom types come in: if you have a good base set, then it really is
not much difference in convenience. Unfortunately we don't have this yet.

Rainer

Sent from phone, thus brief.

Am 07.10.2016 18:53 schrieb "Joe Blow" :

> Hey Radu,
>
> Long time listener, first time caller :).  What did you have problems with
> mmnormalize?
>
> Cheers,
>
> JB
>
> On Fri, Oct 7, 2016 at 12:43 PM, Rainer Gerhards  >
> wrote:
>
> > Just to spread the idea: v2 has custom data types and if used correctly,
> > they provide much of the flexibility of Grok. Unfortunately nobody has
> yet
> > had time to create a set of standard primitive types...
> >
> > Rainer
> >
> > Sent from phone, thus brief.
> >
> > Am 07.10.2016 18:38 schrieb "Radu Gheorghe"  >:
> >
> > > Hi,
> > >
> > > In the meantime I had quite a lot of experience with both. It sounds
> > > like my initial thoughts were pretty good: mmnormalize is A LOT faster
> > > but less flexible than grok (remember there's mmgrok as well - though
> > > it's quite young and there are no packages, you need to compile
> > > manually).
> > >
> > > We've also some performance testing here, if you're interested in
> > > numbers: https://sematext.com/blog/2015/10/16/large-scale-log-
> > > analytics-with-solr/
> > >
> > > So I guess at the end of the day it depends on the use-case. In our
> > > production we do a bit of mmnormalize, but with clients
> > > (https://sematext.com/consulting/logging/) I've used both, depending
> > > on the requirements. If I need something quick (as in "short
> > > development time") and performance isn't critical, I tend to go with
> > > Logstash and grok. If I need something fast, it may be worth spending
> > > a bit of time and setting liblognorm rules right.
> > >
> > > Best regards,
> > > Radu
> > > --
> > > Performance Monitoring * Log Analytics * Search Analytics
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > > On Tue, Oct 4, 2016 at 10:27 AM, mosto...@gmail.com <
> mosto...@gmail.com>
> > > wrote:
> > > > Hi Radu
> > > >
> > > >
> > > > After reading
> > > > http://lists.adiscon.net/pipermail/rsyslog/2013-December/035122.html
> > and
> > > > considering several years have passed, I would like to get some
> > feedback
> > > of
> > > > your experience, to help me choose between raw forwarding
> > > messages+logstash
> > > > or split before forwarding with mmnormalize.
> > > >
> > > > If this decision should've been made today, what you would've
> choosen?
> > > >
> > > > Of course, everybody is welcome to join the thread.
> > > >
> > > > ___
> > > > rsyslog mailing list
> > > > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > > > http://www.rsyslog.com/professional-services/
> > > > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> > myriad
> > > of
> > > > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T
> > > > LIKE THAT.
> > > ___
> > > rsyslog mailing list
> > > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > > http://www.rsyslog.com/professional-services/
> > > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> > > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > > DON'T LIKE THAT.
> > >
> > ___
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T LIKE THAT.
> >
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-10-06 Thread Rainer Gerhards
2016-10-06 11:23 GMT+02:00 mosto...@gmail.com :

>
> Totally agree...(actually, liblognorm is giving me segfaults :P)
>>>
>> I'll try to check next week when my current task is done.
>>
> I know you're busy...trying to contribute as much as I can with everything
> I deal with on my daily work.
>
>
>> Liblognorm is based on work from my MSc Thesis. The thesis paper is
>> currently being processed for upload, I expect it to be available next
>> week. If you'd like to dig down to the details and an explanation why it
>> is
>> faster, the thesis will have it in great detail. I can post a link once it
>> is online.
>>
> Cool
> I guess it should be similar to what a firewall does when it "compiles"
> the rules.
>

Ah, not really, as here we have text detection, which is a different story
for a firewall...

Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-10-06 Thread mosto...@gmail.com



Totally agree...(actually, liblognorm is giving me segfaults :P)

I'll try to check next week when my current task is done.
I know you're busy...trying to contribute as much as I can with 
everything I deal with on my daily work.




Liblognorm is based on work from my MSc Thesis. The thesis paper is
currently being processed for upload, I expect it to be available next
week. If you'd like to dig down to the details and an explanation why it is
faster, the thesis will have it in great detail. I can post a link once it
is online.

Cool
I guess it should be similar to what a firewall does when it "compiles" 
the rules.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-10-06 Thread Rainer Gerhards
2016-10-06 10:42 GMT+02:00 mosto...@gmail.com :

>
>
> El 04/10/16 a las 20:31, Joe Blow escribió:
>
>> 
>>
>> Regex should be avoided like the plague, at all costs.  If you know your
>> logs well enough to write a regex for them, why wouldn't you write a
>> liblognorm rule instead?
>>
> Totally agree...(actually, liblognorm is giving me segfaults :P)


I'll try to check next week when my current task is done.

>
>
> I use liblognorm + rsyslog to forward to ES with very little overhead.  If
>> you like performance and scalability, use liblognorm.
>>
> Ok
>
>> If you got a free
>> Logstash T-shirt from a conference you went to, use Logstash.  At the end
>> of the day rsyslog has a great set of output plugins (mongo, ES, kafka,
>> etc.) so if you get your output into JSON, you're laughing.  Liblognorm
>> does this faster/better/stronger than grok.
>>
> Almost convinced...I'll love to hear more voices anyway
>

Liblognorm is based on work from my MSc Thesis. The thesis paper is
currently being processed for upload, I expect it to be available next
week. If you'd like to dig down to the details and an explanation why it is
faster, the thesis will have it in great detail. I can post a link once it
is online.

HTH
Rainer

> Thanks a lot
>
>
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

2016-10-06 Thread mosto...@gmail.com



El 04/10/16 a las 20:31, Joe Blow escribió:



Regex should be avoided like the plague, at all costs.  If you know your
logs well enough to write a regex for them, why wouldn't you write a
liblognorm rule instead?

Totally agree...(actually, liblognorm is giving me segfaults :P)


I use liblognorm + rsyslog to forward to ES with very little overhead.  If
you like performance and scalability, use liblognorm.

Ok

If you got a free
Logstash T-shirt from a conference you went to, use Logstash.  At the end
of the day rsyslog has a great set of output plugins (mongo, ES, kafka,
etc.) so if you get your output into JSON, you're laughing.  Liblognorm
does this faster/better/stronger than grok.

Almost convinced...I'll love to hear more voices anyway
Thanks a lot

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2016-10-04 Thread Joe Blow


Regex should be avoided like the plague, at all costs.  If you know your
logs well enough to write a regex for them, why wouldn't you write a
liblognorm rule instead?

I use liblognorm + rsyslog to forward to ES with very little overhead.  If
you like performance and scalability, use liblognorm.  If you got a free
Logstash T-shirt from a conference you went to, use Logstash.  At the end
of the day rsyslog has a great set of output plugins (mongo, ES, kafka,
etc.) so if you get your output into JSON, you're laughing.  Liblognorm
does this faster/better/stronger than grok.



Cheers,

JB

On Tue, Oct 4, 2016 at 1:27 PM, mosto...@gmail.com 
wrote:

> Hi Radu
>
>
> After reading http://lists.adiscon.net/pipermail/rsyslog/2013-December/
> 035122.html and considering several years have passed, I would like to
> get some feedback of your experience, to help me choose between raw
> forwarding messages+logstash or split before forwarding with mmnormalize.
>
> If this decision should've been made today, what you would've choosen?
>
> Of course, everybody is welcome to join the thread.
>
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


[rsyslog] liblognorm vs grok

2016-10-04 Thread mosto...@gmail.com

Hi Radu


After reading 
http://lists.adiscon.net/pipermail/rsyslog/2013-December/035122.html and 
considering several years have passed, I would like to get some feedback 
of your experience, to help me choose between raw forwarding 
messages+logstash or split before forwarding with mmnormalize.


If this decision should've been made today, what you would've choosen?

Of course, everybody is welcome to join the thread.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


[rsyslog] liblognorm vs grok

2013-12-04 Thread Radu Gheorghe
Hi list :)

I'm trying to understand if mmnormalize is a good fit for parsing a high
traffic of logs, given the fact that events are really heterogeneous (think
log4j logs, apache logs, whatever logs are commonly produced).

My only frame of reference is Logstash's grok
filterhttp://logstash.net/docs/1.2.2/filters/grok,
which allows you to tag regular expressions in a dictionary, and then use
those tags to match fields from logs, and put them in a structured event.
Much like how you'd build a liblognorm rulebase.

If I got it right, the advantage of mmnormalize seems to be performance,
because it goes around using regular expressions. Not sure how this
actually work, though. Practically, it sounds like this comes at the
expense of flexibility: if I need to add a new pattern in liblognorm
(say, a new date format) I'd have to patch the library itself, no?

If that's the case, it looks like grok would be more suitable for a
heterogeneous environment, because you can just add/remove patterns at
will. There's also a matter of popularity, because grok is quite widely
used, so you can find ready-made dictionaries and rules quite easily. It's
not only about Logstash, as Apache Flume uses a library called Morphlines
which also implements grok:
http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/

Basically, my question is whether liblognorm/mmnormalize can be made
flexible enough to handle the common logging formats out there, or is it
scoped to be a performance-oriented thing for specific use-cases?

Speaking of scope, can liblognorm be enhanced to support parsing multiline
messages? This seems to be possible in grok:
https://logstash.jira.com/browse/LOGSTASH-692

For me, it's important to understand whether I should put effort in working
with mmnormalize and sponsor needed enhancements, or would sponsoring a new
mmgrok module be a better idea for my use-case. Because it looks like
grok is available as a C library as well:
https://github.com/jordansissel/grok

Best regards,
Radu
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2013-12-04 Thread David Lang

On Wed, 4 Dec 2013, Radu Gheorghe wrote:


Hi list :)

I'm trying to understand if mmnormalize is a good fit for parsing a high
traffic of logs, given the fact that events are really heterogeneous (think
log4j logs, apache logs, whatever logs are commonly produced).

My only frame of reference is Logstash's grok
filterhttp://logstash.net/docs/1.2.2/filters/grok,
which allows you to tag regular expressions in a dictionary, and then use
those tags to match fields from logs, and put them in a structured event.
Much like how you'd build a liblognorm rulebase.

If I got it right, the advantage of mmnormalize seems to be performance,
because it goes around using regular expressions. Not sure how this
actually work, though. Practically, it sounds like this comes at the
expense of flexibility: if I need to add a new pattern in liblognorm
(say, a new date format) I'd have to patch the library itself, no?


a completly new type of data you would have to modify the library, but you 
seldom need to do that because when you are processing the logs, all you really 
care about is that this string of characters is the date, you aren't parsing the 
date so that you can do calculations on it.


As long as you can say 'this string of characters is what I care about, and I'm 
going to label it date' you are in good shape.


mmnormalize is far better than regex engines for a couple of reasons.

1. full regex support requires supporting some very expensive types of 
expressions, even if you don't plan to use them. This costs.


2. regex engines almost always go down the list, does regex1 match, if not does 
regex2 match, if not does regex3 match, 


mmnormalize in comparison compiles your config into a parse tree, so it can walk 
down the log message a character at a time, looking that character up in the 
parse tree and when it comes to the end of the line it knows it has the correct 
match, so instead of being O(N) based on the number of rules it's (1) based on 
the (relatively) short length of the lines.



Speaking of scope, can liblognorm be enhanced to support parsing multiline
messages? This seems to be possible in grok:
https://logstash.jira.com/browse/LOGSTASH-692


multiline logs cause all sorts of problems, in general you should avoid them or 
collapse the multiline logs into a single line when you get it into your logging 
system, too many things will break a multiline log into multiple logs. In some 
cases you can carefully configure everything to handle multiline logs, but it's 
very fragile and prevents you from using many tools and transport mechanisms.



For me, it's important to understand whether I should put effort in working
with mmnormalize and sponsor needed enhancements, or would sponsoring a new
mmgrok module be a better idea for my use-case. Because it looks like
grok is available as a C library as well:
https://github.com/jordansissel/grok


It's not clear what enhancements you are thinking that you need (other than the 
multiline support, which as I say is problomatic)


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] liblognorm vs grok

2013-12-04 Thread Radu Gheorghe
Hi David,

Thanks a lot for your reply! I will add my comments inline.

2013/12/4 David Lang da...@lang.hm

 On Wed, 4 Dec 2013, Radu Gheorghe wrote:

  Hi list :)

 I'm trying to understand if mmnormalize is a good fit for parsing a high
 traffic of logs, given the fact that events are really heterogeneous
 (think
 log4j logs, apache logs, whatever logs are commonly produced).

 My only frame of reference is Logstash's grok
 filterhttp://logstash.net/docs/1.2.2/filters/grok,

 which allows you to tag regular expressions in a dictionary, and then use
 those tags to match fields from logs, and put them in a structured event.
 Much like how you'd build a liblognorm rulebase.

 If I got it right, the advantage of mmnormalize seems to be performance,
 because it goes around using regular expressions. Not sure how this
 actually work, though. Practically, it sounds like this comes at the
 expense of flexibility: if I need to add a new pattern in liblognorm
 (say, a new date format) I'd have to patch the library itself, no?


 a completly new type of data you would have to modify the library, but you
 seldom need to do that because when you are processing the logs, all you
 really care about is that this string of characters is the date, you aren't
 parsing the date so that you can do calculations on it.


So you're basically saying that if I just want to copy-paste a new date,
I can simply say word or char-to and it should work. If I need to parse
an SQL date and send it over, for example as an ISO date, I need a new type
and therefore liblognorm needs patching. Right?

If so, this means that I can either do with the field types that exists, or
patch liblognorm. That was my initial assumption, which leaves me a bit
undecided. On one hand, the current set of field types looks like it would
suit 99.9% of the logs out there. On the other hand, you don't
really know until you're trying. I've tried to use mmnormalize a few months
ago in my setup and I failed because it didn't have something to match the
string until the end of the line. Now it has, so I'm going to give it a
second shot. But God knows what will be coming up next. So it would be nice
to have an easy way to define new field types.

I'm guessing this is a design thing. You need to have those specific
types if you want to have the awesome performance. Right?



 As long as you can say 'this string of characters is what I care about,
 and I'm going to label it date' you are in good shape.

 mmnormalize is far better than regex engines for a couple of reasons.

 1. full regex support requires supporting some very expensive types of
 expressions, even if you don't plan to use them. This costs.

 2. regex engines almost always go down the list, does regex1 match, if not
 does regex2 match, if not does regex3 match, 

 mmnormalize in comparison compiles your config into a parse tree, so it
 can walk down the log message a character at a time, looking that character
 up in the parse tree and when it comes to the end of the line it knows it
 has the correct match, so instead of being O(N) based on the number of
 rules it's (1) based on the (relatively) short length of the lines.


Thanks for the explanation. This makes a lot of sense. So it should really
be A LOT faster, which would make a lot of difference at scale.




  Speaking of scope, can liblognorm be enhanced to support parsing multiline
 messages? This seems to be possible in grok:
 https://logstash.jira.com/browse/LOGSTASH-692


 multiline logs cause all sorts of problems, in general you should avoid
 them or collapse the multiline logs into a single line when you get it into
 your logging system, too many things will break a multiline log into
 multiple logs. In some cases you can carefully configure everything to
 handle multiline logs, but it's very fragile and prevents you from using
 many tools and transport mechanisms.


Yeah, I know these tend to be a pain. But I have to deal with them.
Collapsing sounds like a hack to me because I need to be aware of what I'm
doing down the pipeline. For example, something else that works with the
log, like an UI, would need to know that the strange character is actually
a newline. I'll probably also have to escape it... The whole thing sounds
more complicated (and hackier) than dealing with the newline itself.
Especially since, right now at least, from rsyslog my events go to
Elasticsearch (probably something else in future, like HDFS) and then
Kibana and some other UI. All these have no problem handling multi-line
events, so if rsyslog works with them, too, I'll be good.




  For me, it's important to understand whether I should put effort in
 working
 with mmnormalize and sponsor needed enhancements, or would sponsoring a
 new
 mmgrok module be a better idea for my use-case. Because it looks like
 grok is available as a C library as well:
 https://github.com/jordansissel/grok


 It's not clear what enhancements you are thinking that you 

Re: [rsyslog] liblognorm vs grok

2013-12-04 Thread David Lang

On Wed, 4 Dec 2013, Radu Gheorghe wrote:


Hi David,

Thanks a lot for your reply! I will add my comments inline.

2013/12/4 David Lang da...@lang.hm


On Wed, 4 Dec 2013, Radu Gheorghe wrote:

 Hi list :)


I'm trying to understand if mmnormalize is a good fit for parsing a high
traffic of logs, given the fact that events are really heterogeneous
(think
log4j logs, apache logs, whatever logs are commonly produced).

My only frame of reference is Logstash's grok
filterhttp://logstash.net/docs/1.2.2/filters/grok,

which allows you to tag regular expressions in a dictionary, and then use
those tags to match fields from logs, and put them in a structured event.
Much like how you'd build a liblognorm rulebase.

If I got it right, the advantage of mmnormalize seems to be performance,
because it goes around using regular expressions. Not sure how this
actually work, though. Practically, it sounds like this comes at the
expense of flexibility: if I need to add a new pattern in liblognorm
(say, a new date format) I'd have to patch the library itself, no?



a completly new type of data you would have to modify the library, but you
seldom need to do that because when you are processing the logs, all you
really care about is that this string of characters is the date, you aren't
parsing the date so that you can do calculations on it.



So you're basically saying that if I just want to copy-paste a new date,
I can simply say word or char-to and it should work. If I need to parse
an SQL date and send it over, for example as an ISO date, I need a new type
and therefore liblognorm needs patching. Right?


remember that everything is just a string until it's interpreted.

I believe that if you set a variable to the date and then use that variable in a 
template with a timestamp formatting option, it will get interpreted at that 
point (and if not, sponsoring that feature will be far more valuable than 
another parsing type in liblognorm :-)



If so, this means that I can either do with the field types that exists, or
patch liblognorm. That was my initial assumption, which leaves me a bit
undecided. On one hand, the current set of field types looks like it would
suit 99.9% of the logs out there. On the other hand, you don't
really know until you're trying. I've tried to use mmnormalize a few months
ago in my setup and I failed because it didn't have something to match the
string until the end of the line. Now it has, so I'm going to give it a
second shot. But God knows what will be coming up next. So it would be nice
to have an easy way to define new field types.

I'm guessing this is a design thing. You need to have those specific
types if you want to have the awesome performance. Right?


I believe so. I guess it's possible to introduce a language that could be 
compiled down to something efficient at ruleset load time, but that would be 
adding a lot of complexity, and unless someone can show a need for it, it's 
unlikely to happen.



As long as you can say 'this string of characters is what I care about,
and I'm going to label it date' you are in good shape.

mmnormalize is far better than regex engines for a couple of reasons.

1. full regex support requires supporting some very expensive types of
expressions, even if you don't plan to use them. This costs.

2. regex engines almost always go down the list, does regex1 match, if not
does regex2 match, if not does regex3 match, 

mmnormalize in comparison compiles your config into a parse tree, so it
can walk down the log message a character at a time, looking that character
up in the parse tree and when it comes to the end of the line it knows it
has the correct match, so instead of being O(N) based on the number of
rules it's (1) based on the (relatively) short length of the lines.



Thanks for the explanation. This makes a lot of sense. So it should really
be A LOT faster, which would make a lot of difference at scale.


when you are using 'hello world' type examples you aren't going to see a 
difference, but if you load up hundreds to thousands of rules, you will see a 
huge difference.







 Speaking of scope, can liblognorm be enhanced to support parsing multiline

messages? This seems to be possible in grok:
https://logstash.jira.com/browse/LOGSTASH-692



multiline logs cause all sorts of problems, in general you should avoid
them or collapse the multiline logs into a single line when you get it into
your logging system, too many things will break a multiline log into
multiple logs. In some cases you can carefully configure everything to
handle multiline logs, but it's very fragile and prevents you from using
many tools and transport mechanisms.



Yeah, I know these tend to be a pain. But I have to deal with them.
Collapsing sounds like a hack to me because I need to be aware of what I'm
doing down the pipeline. For example, something else that works with the
log, like an UI, would need to know that the strange character is actually
a 

Re: [rsyslog] liblognorm vs grok

2013-12-04 Thread Radu Gheorghe
Thanks a lot, David! This clears up a lot of stuff.

I'll start using mmnormalize then, and I'll bug you guys again if I bump
into issues :)


2013/12/4 David Lang da...@lang.hm

 On Wed, 4 Dec 2013, Radu Gheorghe wrote:

  Hi David,

 Thanks a lot for your reply! I will add my comments inline.

 2013/12/4 David Lang da...@lang.hm

  On Wed, 4 Dec 2013, Radu Gheorghe wrote:

  Hi list :)


 I'm trying to understand if mmnormalize is a good fit for parsing a high
 traffic of logs, given the fact that events are really heterogeneous
 (think
 log4j logs, apache logs, whatever logs are commonly produced).

 My only frame of reference is Logstash's grok
 filterhttp://logstash.net/docs/1.2.2/filters/grok,

 which allows you to tag regular expressions in a dictionary, and then
 use
 those tags to match fields from logs, and put them in a structured
 event.
 Much like how you'd build a liblognorm rulebase.

 If I got it right, the advantage of mmnormalize seems to be performance,
 because it goes around using regular expressions. Not sure how this
 actually work, though. Practically, it sounds like this comes at the
 expense of flexibility: if I need to add a new pattern in liblognorm
 (say, a new date format) I'd have to patch the library itself, no?


 a completly new type of data you would have to modify the library, but
 you
 seldom need to do that because when you are processing the logs, all you
 really care about is that this string of characters is the date, you
 aren't
 parsing the date so that you can do calculations on it.


 So you're basically saying that if I just want to copy-paste a new date,
 I can simply say word or char-to and it should work. If I need to
 parse
 an SQL date and send it over, for example as an ISO date, I need a new
 type
 and therefore liblognorm needs patching. Right?


 remember that everything is just a string until it's interpreted.

 I believe that if you set a variable to the date and then use that
 variable in a template with a timestamp formatting option, it will get
 interpreted at that point (and if not, sponsoring that feature will be far
 more valuable than another parsing type in liblognorm :-)


  If so, this means that I can either do with the field types that exists,
 or
 patch liblognorm. That was my initial assumption, which leaves me a bit
 undecided. On one hand, the current set of field types looks like it would
 suit 99.9% of the logs out there. On the other hand, you don't
 really know until you're trying. I've tried to use mmnormalize a few
 months
 ago in my setup and I failed because it didn't have something to match the
 string until the end of the line. Now it has, so I'm going to give it a
 second shot. But God knows what will be coming up next. So it would be
 nice
 to have an easy way to define new field types.

 I'm guessing this is a design thing. You need to have those specific
 types if you want to have the awesome performance. Right?


 I believe so. I guess it's possible to introduce a language that could be
 compiled down to something efficient at ruleset load time, but that would
 be adding a lot of complexity, and unless someone can show a need for it,
 it's unlikely to happen.


  As long as you can say 'this string of characters is what I care about,
 and I'm going to label it date' you are in good shape.

 mmnormalize is far better than regex engines for a couple of reasons.

 1. full regex support requires supporting some very expensive types of
 expressions, even if you don't plan to use them. This costs.

 2. regex engines almost always go down the list, does regex1 match, if
 not
 does regex2 match, if not does regex3 match, 

 mmnormalize in comparison compiles your config into a parse tree, so it
 can walk down the log message a character at a time, looking that
 character
 up in the parse tree and when it comes to the end of the line it knows it
 has the correct match, so instead of being O(N) based on the number of
 rules it's (1) based on the (relatively) short length of the lines.



 Thanks for the explanation. This makes a lot of sense. So it should really
 be A LOT faster, which would make a lot of difference at scale.


 when you are using 'hello world' type examples you aren't going to see a
 difference, but if you load up hundreds to thousands of rules, you will see
 a huge difference.





  Speaking of scope, can liblognorm be enhanced to support parsing
 multiline

 messages? This seems to be possible in grok:
 https://logstash.jira.com/browse/LOGSTASH-692


 multiline logs cause all sorts of problems, in general you should avoid
 them or collapse the multiline logs into a single line when you get it
 into
 your logging system, too many things will break a multiline log into
 multiple logs. In some cases you can carefully configure everything to
 handle multiline logs, but it's very fragile and prevents you from using
 many tools and transport mechanisms.



 Yeah, I know these tend to be a pain. But