On 2/20/2018 6:39 PM, deoren wrote:
I've been attempting to use the re_extract() function quite a bit lately
to write some simple "filters" for notification purposes. I struggled
with the syntax for a while until I realized tha theĀ and have been
struggling quite a bit with the regex support for the re_extract()
function. According to the http://www.rsyslog.com/regex/ page (and the
re_extract function doc), Rsyslog uses POSIX ERE and "optionally" BRE
expressions.
* Does anyone have a good guide or reference for the syntax needed?
* How do you switch the regex type from ERE to BRE? At at glance it
appears that the BRE format is more cumbersome, so I want to make sure
that I don't unintentionally switch that mode on somehow.
I found the differences between the two briefly described on this page:
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
Does Rsyslog have complete support for ERE expressions? In other words,
if I find a guide which covers ERE thoroughly, is that sufficient or are
there gaps in rsyslog's support for the ERE syntax that I should be
aware of?
Thanks.
Addendum to my earlier questions (which are still valid and "open" for
feedback):
Real world example of what I'm working with (single line, likely wrapped
by my mail client):
123.123.123.123 - abc1234 [20/Feb/2018:10:36:01 -0600] "GET
http://example.org:80/servlet/SPECIFIC_PATTERN_HERE HTTP/1.1" 200 2182
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/39.0.2171.95 Safari/537.36"
here is a PCRE regex that _seems_ to do what I want:
^([0-9]+.[0-9]+.[0-9]+.[0-9]+)\s\-\s([A-Za-z0-9]+)\s\[([0-9A-Za-z:\/\s-]+)\]
and provides me with three match group results I can reference:
1. 123.123.123.123
2. abc1234
3. 20/Feb/2018:10:36:01 -0600
As I understand it, re_extract allows retrieving only a specific match
at a time, so I grab the two I care about like so and save to local
variables (this processing is done on the primary receiver):
set $.remote-ip = re_extract(
$msg,
"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
0, 1,
'unknown remote ip');
set $.remote-user = re_extract(
$msg,
"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
0, 2,
'unknown remote user');
This seems to work and only required escaping the escape character (how
"meta").
I've read that mmnormalize is recommended over regexes for performance
reasons, but I have little experience with liblognorm (other than
knowing it exists). Am I better off writing a few regex matches like I'm
doing above or crafting (and testing) liblognorm rulesets, using them
with mmnormalize to generate a JSON structure and then pulling what I
want from a JSON structure?
In this case, my specific goal is to look for log messages containing
"SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a match
is found parse the message to pull out specific values. Those values are
then used to generate a notification for our ticketing system (e.g.,
specific URL patterns indicate abuse that we need to review further
before our vendor contacts us and threatens to cut off service). In this
case we're not matching a possible range of patterns, but a very
specific string that is known to us.
I know there are dedicated tools for pattern matching and reporting
(Graylog is something I'm kicking the tires on and I've heard that
Riemann is designed for tasks like this), but I was hoping to get some
basic monitoring in place now with a tool that I'm halfway familiar with
before attempting to implement other tools for easier management of more
complex patterns. I've already implemented 4-5 other notifications and
it's worked well thus far, but I wanted to get input from the community
to see if I'm going about this the wrong way (first using regexes over
mmnormalize, then as a secondary issue using rsyslog for notifications
vs Graylog or Riemann).
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.