On Thu, 23 Aug 2012, Miloslav Trmac wrote:

----- Original Message -----
On Wed, 22 Aug 2012, Miloslav Trmac wrote:

----- Original Message -----
Why have your own template engine instead of using the normal rsyslog
template engine?

Primarily I was considering the use case of modifying $!all-json - i.e.
store any incoming record in full, but add an "incoming host name"
(overriding any "incoming host name" value in the record).  I don't
think this can be done with pure textual substitution without
understanding the field structure.  (This is not specific to mongodb -
using a textual long file in JSON format would require similar
understanding of the field data.)  Or perhaps this kind of
functionality
isn't useful, or it could be done in a different way?

Why should you override whatever the sysadmin has configured? they may
want to add the "incoming host name" to the record and they may not. Why
should your om always log something instead of being like every other om
and logging what the sysadmin tells you to log?

I don't want to hard-code adding a host name field - on the contrary, the use case I 
consider involves the sysadmin defining their desired field set and contents, via a 
template, and keeping the design as described at 
http://www.rsyslog.com/doc/dev_oplugins.html (e.g. "Your plugin will only receive 
what the end user has configured in a $template statement").

It just seems to me that text-based templates don't fit the case where we 
actually want to work with a deeper structure very well - is it even possible 
to write a template like
"%$!all-json_except_that_host_name_field_is_modified_to_a_specific_value%" ?
using the existing field processing facilities? And even if it were somehow possible to do this using regexps, would the result be sufficiently user-friendly?

It is possible to craft arbitrary text in a template today, and that text can end up looking like a JSON string.

There will be more and easier ways to modify the JSON output in the future.

As a result, if you throw out the rsyslog template, then you are going to be duplicating the funationality that rsyslog is going to be providing, and forcing the admins to learn two sets of template/message modification configs, the rsyslog version and the ommongodb version


I can see a fairly reasonable alternative - mostly give up on templates for ommongodb (e.g. only support an equivalent of %$!all-json%, not arbitrary templates), and create a separate message modification module that could be used for arbitrary field editing.

(The field-based templates could potentially be also used in text-based modules, by using the field-based template, then formatting the result as a correctly escaped JSON string. This would allow users to store the logs as text, but use the comparatively easier editing facilities of the field-based templates. Again, perhaps a separate message modification module would work better than trying to use templates for this.)

the thing is that this "modification module" should be in rsyslog, not in one output module. you aren't the only destination that's interested in getting JSON, and everyone who gets JSON eventually wants to modify it.


Secondarily, this allows building the mongodb inputs without having
to
do a fairly expensive field list -> JSON data -> JSON text -> JSON
data
-> BSON roundtrip, but that is a secondary concern.

I'm not sure I really understand the idea of reading log files from a
database as an input source. But in any case, why should anything related
to input have an effect on outputs?

I'm focused on a log aggregator machine that receives text data, some of it in the Lumberjack @cee marking, from one or more hosts, and stores it all into mongodb. The input is in text, not a database. I'm afraid I have omitted the initial steps in the configuration, and was imprecise in other ways.

In more detail, I think this would happen with text-based templates:
1. Lines of text are received using any network protocol
2. mmjsonparse extracts field values from @cee-marked messages:
  2a. a JSON parser converts text into a JSON parser data structure
  2b. ... which is converted into libee data structure

one thing you are missing is how rsyslog does this internally, it does it by making one copy of the string and then walking through the string, replacing spaces with nulls and keeping pointers to the start of each substring. As a result this is a very efficient process.

3. Rsyslog core processes a template with field values and other properties pasted, to create a text line:
  3a: libee data structures are repeatedly searched for relevant fields
  3b: each of the fields and other properties is converted into partial strings

these fields are all text to start with, they don't have to be converted into partial strings.

3c: these strings are concatenated to create a single text line (hoping that the user got JSON escaping exactly right)




4. ommongodb receives the template-formed string, and acts on it:
  4a: a JSON parser converts the string into a JSON parser data structure
  4b: ... which is converted into a BSON data structure
  4c: ... which is converted into a BSON byte stream, and finally sent to the 
MongoDB server.

This seems like a fairly inefficent way of doing things, why not convert the JSON string directly to a BSON byte stream?

With the field-based templates, 3. and 4. is:
3. Rsyslog core processes a template with field values and other properties 
pasted, to create a list of named fields
  3a: libee data structures are repeatedly searched for relevant fields
  3b': each of the fields and other properties is individually converted into 
text
  (3c missing)

they are text aready, there's no conversion needed in step 3b

4. ommongodb receives the template-formed list of fields, and acts on it:
  (4a missing)
  4b': The field list is converted into a BSON data structure
  4c: ... which is converted into a BSON byte stream, and finally sent to the 
MongoDB server.

you are missing that ommongodb gets the list of fields and then "repeatedly searches for the relavent fields" in the data structure.

There is some value in the idea of creating a new module interface that sends a data structure instead of a string, but this is a fairly significant change to the core of rsyslog. And even if it turns out to be a good idea to have a different interface for some modules, it's still a bad idea to have the manipulation of this list (adding fields, etc) be done in the output module. It should be done in rsyslog.


i.e. the result can be done with a little less effort.  That's not a decisive 
factor for me, though.


Using the raw "sequence of fields without any formatting" format is not
great, I agree - but then pretending that the template can be an
arbitrary JSON format and we parse it intelligently is not great
either.
However that's definitely open to a change.

I thought I saw you saying that you wanted to send JSON to the database.
If that is the case, then let the sysadmin create the JSON and insert
that.

The MongoDB command format, unlike most SQL databases, does not send a text command, but something pre-parsed. It is not possible to just send what the sysadmin created as-is. (It is, of course, possible to just parse it into JSON and send that with zero semantic modification).


It's common for people to want to have rules along the lines of

(not rsyslog syntax)
if sourceip in $list then set tag X in output log to a fixed value

if you end up creating your own template language

Yes, I was thinking about things like that using templates, and it seems to me that the current text-based message modification facilities are not very easy to use for performing such operations on JSON text.

it's not good right now, but work on this is ongoing. My point is that this should be done in rsyslog, not in your module.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards

Reply via email to