Thanks for the comments. ----- Original Message ----- > First off, let me say tht I'm not very familar with mongodb, so it may > work differently then I am thinking it does.
Likewise, I'm quite new to all of this. > On Wed, 22 Aug 2012, Miloslav Trmac wrote: > > > ----- Original Message ----- > >> Why have your own template engine instead of using the normal rsyslog > >> template engine? > > > > Primarily I was considering the use case of modifying $!all-json - i.e. > > store any incoming record in full, but add an "incoming host name" > > (overriding any "incoming host name" value in the record). I don't > > think this can be done with pure textual substitution without > > understanding the field structure. (This is not specific to mongodb - > > using a textual long file in JSON format would require similar > > understanding of the field data.) Or perhaps this kind of > > functionality > > isn't useful, or it could be done in a different way? > > Why should you override whatever the sysadmin has configured? they may > want to add the "incoming host name" to the record and they may not. Why > should your om always log something instead of being like every other om > and logging what the sysadmin tells you to log? I don't want to hard-code adding a host name field - on the contrary, the use case I consider involves the sysadmin defining their desired field set and contents, via a template, and keeping the design as described at http://www.rsyslog.com/doc/dev_oplugins.html (e.g. "Your plugin will only receive what the end user has configured in a $template statement"). It just seems to me that text-based templates don't fit the case where we actually want to work with a deeper structure very well - is it even possible to write a template like > "%$!all-json_except_that_host_name_field_is_modified_to_a_specific_value%" ? using the existing field processing facilities? And even if it were somehow possible to do this using regexps, would the result be sufficiently user-friendly? I can see a fairly reasonable alternative - mostly give up on templates for ommongodb (e.g. only support an equivalent of %$!all-json%, not arbitrary templates), and create a separate message modification module that could be used for arbitrary field editing. (The field-based templates could potentially be also used in text-based modules, by using the field-based template, then formatting the result as a correctly escaped JSON string. This would allow users to store the logs as text, but use the comparatively easier editing facilities of the field-based templates. Again, perhaps a separate message modification module would work better than trying to use templates for this.) > > Secondarily, this allows building the mongodb inputs without having > > to > > do a fairly expensive field list -> JSON data -> JSON text -> JSON > > data > > -> BSON roundtrip, but that is a secondary concern. > > I'm not sure I really understand the idea of reading log files from a > database as an input source. But in any case, why should anything related > to input have an effect on outputs? I'm focused on a log aggregator machine that receives text data, some of it in the Lumberjack @cee marking, from one or more hosts, and stores it all into mongodb. The input is in text, not a database. I'm afraid I have omitted the initial steps in the configuration, and was imprecise in other ways. In more detail, I think this would happen with text-based templates: 1. Lines of text are received using any network protocol 2. mmjsonparse extracts field values from @cee-marked messages: 2a. a JSON parser converts text into a JSON parser data structure 2b. ... which is converted into libee data structure 3. Rsyslog core processes a template with field values and other properties pasted, to create a text line: 3a: libee data structures are repeatedly searched for relevant fields 3b: each of the fields and other properties is converted into partial strings 3c: these strings are concatenated to create a single text line (hoping that the user got JSON escaping exactly right) 4. ommongodb receives the template-formed string, and acts on it: 4a: a JSON parser converts the string into a JSON parser data structure 4b: ... which is converted into a BSON data structure 4c: ... which is converted into a BSON byte stream, and finally sent to the MongoDB server. With the field-based templates, 3. and 4. is: 3. Rsyslog core processes a template with field values and other properties pasted, to create a list of named fields 3a: libee data structures are repeatedly searched for relevant fields 3b': each of the fields and other properties is individually converted into text (3c missing) 4. ommongodb receives the template-formed list of fields, and acts on it: (4a missing) 4b': The field list is converted into a BSON data structure 4c: ... which is converted into a BSON byte stream, and finally sent to the MongoDB server. i.e. the result can be done with a little less effort. That's not a decisive factor for me, though. > > Using the raw "sequence of fields without any formatting" format is not > > great, I agree - but then pretending that the template can be an > > arbitrary JSON format and we parse it intelligently is not great > > either. > > However that's definitely open to a change. > > I thought I saw you saying that you wanted to send JSON to the database. > If that is the case, then let the sysadmin create the JSON and insert > that. The MongoDB command format, unlike most SQL databases, does not send a text command, but something pre-parsed. It is not possible to just send what the sysadmin created as-is. (It is, of course, possible to just parse it into JSON and send that with zero semantic modification). > It's common for people to want to have rules along the lines of > > (not rsyslog syntax) > if sourceip in $list then set tag X in output log to a fixed value > > if you end up creating your own template language Yes, I was thinking about things like that using templates, and it seems to me that the current text-based message modification facilities are not very easy to use for performing such operations on JSON text. > >> The users are already familar with, and that gets extended to cover new > >> things without you having to duplicate the effort. This includes the > >> ability to dump all properties out as JSON and is going to include > >> abilities to modify the fields and leave some out of the output in the > >> future (as part of the entire lumberjack related effort) > > This sounds interesting, is there any code I could look at? > > start by looking at the existing json formatting options for templates, Right, I was specifically interested in the abilities to modify fields or leave them out. I don't think that's currently possible, or is it? > >>> - What to do about non-string values? mongodb recognizes different > >>> types, and it would be good to use the native one (so that numbers > >>> instead of strings could be compared - note that there is no automatic > >>> type conversion when comparing different types in mongodb). Non-string > >>> types can be handled on output (by adding > >>> "format-as-int"/"format-as-date" options), but AFAICS on input / > >>> mmjsonparse everything is treated as strings. Is it at this point > >>> realistic to think about preserving the type of data as presented on > >>> input while reformatting it (e.g. by using mmjsonparse, and a template > >>> with $!all-json and some of the above-mentioned field "edits")? Or is > >>> rsyslog so fundamentally based on strings that this would take too much > >>> work? (There is always the option to preserve types simply by treating > >>> the JSON as unmodified plaintext). (Supposedly libee will be changed to use a JSON representation, which would resolve my concern. I'm just showing the details below for completeness.) <snip> > > The case I was thinking about is the same as above - keep the original > > JSON, but add or modify a little. > > > > So, if the input event is > >> @cee: {"field1": "string", "field2": 5.0, "field3": [1,2,3]} > > I would like the data stored in MongoDB to directly correspond, > > e.g. > >> {"host_name":"server1", "field1": "string", "field2": 5.0, > >> "field3": [1,2,3]} > > not modify it to, say, > >> {"host_name":"server1", "field1": "string", "field2": "5.0", > >> "field3": "[1,2,3]"} > > and what makes you think that rsyslog is going to change it instead of > keeping it the same? (reordered) > > however, I suspect that the incoming @cee formatted message is going to > have all fields quoted. No, the above is valid JSON without any quoting - "field2" is a float. It turns out it is not a valid Lumberjack record because https://fedorahosted.org/lumberjack/wiki/SyntaxFormats#FieldTypes prohibits arrays, so let's instead use: > @cee: {"field1": "string", "field2": 5.0, "field3": {"sub1":"foo", > "sub2":"bar"}} With the following sample configuration: > #this template should eventually modify the output more > $Template CEETemplate,"@cee: %$!all-json%\n" > $ModLoad imuxsock > $ModLoad mmjsonparse > *.* :mmjsonparse: > & /path/to/ceelog;CEETemplate running > logger -d -u /path/to/dev/log -p mail.info -t mymailer '@cee: {"field1": > "string", "field2": 5.0, "field3": {"sub1":"foo", "sub2":"bar"}}' results in > @cee: {"field1": "string", "field2": "5", "field3.sub1": "foo", > "field3.sub2": "bar"} i.e.: * "field2" was converted from a float to an integer-as-string * The "field3" subobject was converted into something else > > MongoDB documentation seems to suggest that it doesn't support comparing > > mixed field types much, so changing the value types sounds undesirable. > > you don't need to have your own template language to do this, let the > sysadmin specify the format (go ahead and have a default format ifthe > sysadmin doesn't specify one). The sysadmin could specify a specific format for particular fields, but %$!all-json% should probably not modify the message contents as shown above: rsyslog and libee are currently fundamentally based on strings, and not really able to represent the more general JSON values. Mirek _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards

