Thanks for the comments.
----- Original Message -----
> First off, let me say tht I'm not very familar with mongodb, so it may
> work differently then I am thinking it does.

Likewise, I'm quite new to all of this.


> On Wed, 22 Aug 2012, Miloslav Trmac wrote:
> 
> > ----- Original Message -----
> >> Why have your own template engine instead of using the normal rsyslog
> >> template engine?
> >
> > Primarily I was considering the use case of modifying $!all-json - i.e.
> > store any incoming record in full, but add an "incoming host name"
> > (overriding any "incoming host name" value in the record).  I don't
> > think this can be done with pure textual substitution without
> > understanding the field structure.  (This is not specific to mongodb -
> > using a textual long file in JSON format would require similar
> > understanding of the field data.)  Or perhaps this kind of
> > functionality
> > isn't useful, or it could be done in a different way?
> 
> Why should you override whatever the sysadmin has configured? they may
> want to add the "incoming host name" to the record and they may not. Why
> should your om always log something instead of being like every other om
> and logging what the sysadmin tells you to log?

I don't want to hard-code adding a host name field - on the contrary, the use 
case I consider involves the sysadmin defining their desired field set and 
contents, via a template, and keeping the design as described at 
http://www.rsyslog.com/doc/dev_oplugins.html (e.g. "Your plugin will only 
receive what the end user has configured in a $template statement").

It just seems to me that text-based templates don't fit the case where we 
actually want to work with a deeper structure very well - is it even possible 
to write a template like
> "%$!all-json_except_that_host_name_field_is_modified_to_a_specific_value%" ?
using the existing field processing facilities?  And even if it were somehow 
possible to do this using regexps, would the result be sufficiently 
user-friendly?


I can see a fairly reasonable alternative - mostly give up on templates for 
ommongodb (e.g. only support an equivalent of %$!all-json%, not arbitrary 
templates), and create a separate message modification module that could be 
used for arbitrary field editing.

(The field-based templates could potentially be also used in text-based 
modules, by using the field-based template, then formatting the result as a 
correctly escaped JSON string.  This would allow users to store the logs as 
text, but use the comparatively easier editing facilities of the field-based 
templates.  Again, perhaps a separate message modification module would work 
better than trying to use templates for this.)


> > Secondarily, this allows building the mongodb inputs without having
> > to
> > do a fairly expensive field list -> JSON data -> JSON text -> JSON
> > data
> > -> BSON roundtrip, but that is a secondary concern.
> 
> I'm not sure I really understand the idea of reading log files from a
> database as an input source. But in any case, why should anything related
> to input have an effect on outputs?

I'm focused on a log aggregator machine that receives text data, some of it in 
the Lumberjack @cee marking, from one or more hosts, and stores it all into 
mongodb.  The input is in text, not a database.  I'm afraid I have omitted the 
initial steps in the configuration, and was imprecise in other ways.

In more detail, I think this would happen with text-based templates:
1. Lines of text are received using any network protocol
2. mmjsonparse extracts field values from @cee-marked messages:
   2a. a JSON parser converts text into a JSON parser data structure
   2b. ... which is converted into libee data structure
3. Rsyslog core processes a template with field values and other properties 
pasted, to create a text line:
   3a: libee data structures are repeatedly searched for relevant fields
   3b: each of the fields and other properties is converted into partial strings
   3c: these strings are concatenated to create a single text line (hoping that 
the user got JSON escaping exactly right)
4. ommongodb receives the template-formed string, and acts on it:
   4a: a JSON parser converts the string into a JSON parser data structure
   4b: ... which is converted into a BSON data structure
   4c: ... which is converted into a BSON byte stream, and finally sent to the 
MongoDB server.

With the field-based templates, 3. and 4. is:
3. Rsyslog core processes a template with field values and other properties 
pasted, to create a list of named fields
   3a: libee data structures are repeatedly searched for relevant fields
   3b': each of the fields and other properties is individually converted into 
text
   (3c missing)
4. ommongodb receives the template-formed list of fields, and acts on it:
   (4a missing)
   4b': The field list is converted into a BSON data structure
   4c: ... which is converted into a BSON byte stream, and finally sent to the 
MongoDB server.

i.e. the result can be done with a little less effort.  That's not a decisive 
factor for me, though.


> > Using the raw "sequence of fields without any formatting" format is not
> > great, I agree - but then pretending that the template can be an
> > arbitrary JSON format and we parse it intelligently is not great
> > either.
> > However that's definitely open to a change.
> 
> I thought I saw you saying that you wanted to send JSON to the database.
> If that is the case, then let the sysadmin create the JSON and insert
> that.

The MongoDB command format, unlike most SQL databases, does not send a text 
command, but something pre-parsed.  It is not possible to just send what the 
sysadmin created as-is.  (It is, of course, possible to just parse it into JSON 
and send that with zero semantic modification).


> It's common for people to want to have rules along the lines of
> 
> (not rsyslog syntax)
> if sourceip in $list then set tag X in output log to a fixed value
> 
> if you end up creating your own template language

Yes, I was thinking about things like that using templates, and it seems to me 
that the current text-based message modification facilities are not very easy 
to use for performing such operations on JSON text.


> >> The users are already familar with, and that gets extended to cover new
> >> things without you having to duplicate the effort. This includes the
> >> ability to dump all properties out as JSON and is going to include
> >> abilities to modify the fields and leave some out of the output in the
> >> future (as part of the entire lumberjack related effort)
> > This sounds interesting, is there any code I could look at?
> 
> start by looking at the existing json formatting options for templates,

Right, I was specifically interested in the abilities to modify fields or leave 
them out.  I don't think that's currently possible, or is it?


> >>> - What to do about non-string values?  mongodb recognizes different
> >>> types, and it would be good to use the native one (so that numbers
> >>> instead of strings could be compared - note that there is no automatic
> >>> type conversion when comparing different types in mongodb).  Non-string
> >>> types can be handled on output (by adding
> >>> "format-as-int"/"format-as-date" options), but AFAICS on input /
> >>> mmjsonparse everything is treated as strings.  Is it at this point
> >>> realistic to think about preserving the type of data as presented on
> >>> input while reformatting it (e.g. by using mmjsonparse, and a template
> >>> with $!all-json and some of the above-mentioned field "edits")?  Or is
> >>> rsyslog so fundamentally based on strings that this would take too much
> >>> work?  (There is always the option to preserve types simply by treating
> >>> the JSON as unmodified plaintext).

(Supposedly libee will be changed to use a JSON representation, which would 
resolve my concern.  I'm just showing the details below for completeness.)

<snip>
> > The case I was thinking about is the same as above - keep the original
> > JSON, but add or modify a little.
> >
> > So, if the input event is
> >> @cee: {"field1": "string", "field2": 5.0, "field3": [1,2,3]}
> > I would like the data stored in MongoDB to directly correspond,
> > e.g.
> >> {"host_name":"server1", "field1": "string", "field2": 5.0,
> >> "field3": [1,2,3]}
> > not modify it to, say,
> >> {"host_name":"server1", "field1": "string", "field2": "5.0",
> >> "field3": "[1,2,3]"}
> 
> and what makes you think that rsyslog is going to change it instead of
> keeping it the same?
(reordered)
> 
> however, I suspect that the incoming @cee formatted message is going to
> have all fields quoted.
No, the above is valid JSON without any quoting - "field2" is a float.  It 
turns out it is not a valid Lumberjack record because 
https://fedorahosted.org/lumberjack/wiki/SyntaxFormats#FieldTypes prohibits 
arrays, so let's instead use:
> @cee: {"field1": "string", "field2": 5.0, "field3": {"sub1":"foo", 
> "sub2":"bar"}}

With the following sample configuration:
> #this template should eventually modify the output more
> $Template CEETemplate,"@cee: %$!all-json%\n"
> $ModLoad imuxsock
> $ModLoad mmjsonparse
> *.* :mmjsonparse:
> &   /path/to/ceelog;CEETemplate

running
> logger -d -u /path/to/dev/log -p mail.info -t mymailer '@cee: {"field1": 
> "string", "field2": 5.0, "field3": {"sub1":"foo", "sub2":"bar"}}'
results in
> @cee: {"field1": "string", "field2": "5", "field3.sub1": "foo", 
> "field3.sub2": "bar"}
i.e.:
* "field2" was converted from a float to an integer-as-string
* The "field3" subobject was converted into something else

> > MongoDB documentation seems to suggest that it doesn't support comparing
> > mixed field types much, so changing the value types sounds undesirable.
> 
> you don't need to have your own template language to do this, let the
> sysadmin specify the format (go ahead and have a default format ifthe
> sysadmin doesn't specify one).

The sysadmin could specify a specific format for particular fields, but 
%$!all-json% should probably not modify the message contents as shown above: 
rsyslog and libee are currently fundamentally based on strings, and not really 
able to represent the more general JSON values.
    Mirek
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards

Reply via email to