Re: [rsyslog] Regex logging to MongoDB

david Thu, 23 Aug 2012 11:39:38 -0700

On Thu, 23 Aug 2012, Miloslav Trmac wrote:

----- Original Message -----

On Wed, 22 Aug 2012, Miloslav Trmac wrote:

----- Original Message -----

Why have your own template engine instead of using the normal rsyslog
template engine?


Primarily I was considering the use case of modifying $!all-json - i.e.
store any incoming record in full, but add an "incoming host name"
(overriding any "incoming host name" value in the record).  I don't
think this can be done with pure textual substitution without
understanding the field structure.  (This is not specific to mongodb -
using a textual long file in JSON format would require similar
understanding of the field data.)  Or perhaps this kind of
functionality
isn't useful, or it could be done in a different way?


Why should you override whatever the sysadmin has configured? they may
want to add the "incoming host name" to the record and they may not. Why
should your om always log something instead of being like every other om
and logging what the sysadmin tells you to log?


I don't want to hard-code adding a host name field - on the contrary, the use case I 
consider involves the sysadmin defining their desired field set and contents, via a 
template, and keeping the design as described at 
http://www.rsyslog.com/doc/dev_oplugins.html (e.g. "Your plugin will only receive 
what the end user has configured in a $template statement").

It just seems to me that text-based templates don't fit the case where we 
actually want to work with a deeper structure very well - is it even possible 
to write a template like

"%$!all-json_except_that_host_name_field_is_modified_to_a_specific_value%" ?

using the existing field processing facilities? And even if it weresomehow possible to do this using regexps, would the result besufficiently user-friendly?

It is possible to craft arbitrary text in a template today, and that textcan end up looking like a JSON string.

There will be more and easier ways to modify the JSON output in thefuture.

As a result, if you throw out the rsyslog template, then you are going tobe duplicating the funationality that rsyslog is going to be providing,and forcing the admins to learn two sets of template/message modificationconfigs, the rsyslog version and the ommongodb version

I can see a fairly reasonable alternative - mostly give up on templatesfor ommongodb (e.g. only support an equivalent of %$!all-json%, notarbitrary templates), and create a separate message modification modulethat could be used for arbitrary field editing.
(The field-based templates could potentially be also used in text-basedmodules, by using the field-based template, then formatting the resultas a correctly escaped JSON string. This would allow users to store thelogs as text, but use the comparatively easier editing facilities of thefield-based templates. Again, perhaps a separate message modificationmodule would work better than trying to use templates for this.)

the thing is that this "modification module" should be in rsyslog, not inone output module. you aren't the only destination that's interested ingetting JSON, and everyone who gets JSON eventually wants to modify it.

Secondarily, this allows building the mongodb inputs without having
to
do a fairly expensive field list -> JSON data -> JSON text -> JSON
data
-> BSON roundtrip, but that is a secondary concern.


I'm not sure I really understand the idea of reading log files from a
database as an input source. But in any case, why should anything related
to input have an effect on outputs?

I'm focused on a log aggregator machine that receives text data, some ofit in the Lumberjack @cee marking, from one or more hosts, and stores itall into mongodb. The input is in text, not a database. I'm afraid Ihave omitted the initial steps in the configuration, and was imprecisein other ways.


In more detail, I think this would happen with text-based templates:
1. Lines of text are received using any network protocol
2. mmjsonparse extracts field values from @cee-marked messages:
  2a. a JSON parser converts text into a JSON parser data structure
  2b. ... which is converted into libee data structure

one thing you are missing is how rsyslog does this internally, it does itby making one copy of the string and then walking through the string,replacing spaces with nulls and keeping pointers to the start of eachsubstring. As a result this is a very efficient process.

3. Rsyslog core processes a template with field values and otherproperties pasted, to create a text line:
  3a: libee data structures are repeatedly searched for relevant fields
  3b: each of the fields and other properties is converted into partial strings

these fields are all text to start with, they don't have to be convertedinto partial strings.

3c: these strings are concatenated to create a single text line(hoping that the user got JSON escaping exactly right)

4. ommongodb receives the template-formed string, and acts on it:
  4a: a JSON parser converts the string into a JSON parser data structure
  4b: ... which is converted into a BSON data structure
  4c: ... which is converted into a BSON byte stream, and finally sent to the 
MongoDB server.

This seems like a fairly inefficent way of doing things, why not convertthe JSON string directly to a BSON byte stream?

With the field-based templates, 3. and 4. is:
3. Rsyslog core processes a template with field values and other properties 
pasted, to create a list of named fields
  3a: libee data structures are repeatedly searched for relevant fields
  3b': each of the fields and other properties is individually converted into 
text
  (3c missing)


they are text aready, there's no conversion needed in step 3b

4. ommongodb receives the template-formed list of fields, and acts on it:
  (4a missing)
  4b': The field list is converted into a BSON data structure
  4c: ... which is converted into a BSON byte stream, and finally sent to the 
MongoDB server.

you are missing that ommongodb gets the list of fields and then"repeatedly searches for the relavent fields" in the data structure.

There is some value in the idea of creating a new module interface thatsends a data structure instead of a string, but this is a fairlysignificant change to the core of rsyslog. And even if it turns out to bea good idea to have a different interface for some modules, it's still abad idea to have the manipulation of this list (adding fields, etc) bedone in the output module. It should be done in rsyslog.

i.e. the result can be done with a little less effort.  That's not a decisive 
factor for me, though.
Using the raw "sequence of fields without any formatting" format is not
great, I agree - but then pretending that the template can be an
arbitrary JSON format and we parse it intelligently is not great
either.
However that's definitely open to a change.
I thought I saw you saying that you wanted to send JSON to the database.
If that is the case, then let the sysadmin create the JSON and insert
that.
The MongoDB command format, unlike most SQL databases, does not send atext command, but something pre-parsed. It is not possible to just sendwhat the sysadmin created as-is. (It is, of course, possible to justparse it into JSON and send that with zero semantic modification).
It's common for people to want to have rules along the lines of

(not rsyslog syntax)
if sourceip in $list then set tag X in output log to a fixed value

if you end up creating your own template language
Yes, I was thinking about things like that using templates, and it seemsto me that the current text-based message modification facilities arenot very easy to use for performing such operations on JSON text.

it's not good right now, but work on this is ongoing. My point is thatthis should be done in rsyslog, not in your module.


David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards

Re: [rsyslog] Regex logging to MongoDB

Reply via email to