First off, let me say tht I'm not very familar with mongodb, so it may
work differently then I am thinking it does.
David Lang
On Wed, 22 Aug 2012, Miloslav Trmac wrote:
----- Original Message -----
On Wed, 22 Aug 2012, Miloslav Trmac wrote:
- I this approach reasonable? The problem with this is that the "field"
treatment of the template is so different from other cases; there is a
precedent with omoracledb's use of ...AS_ARRAY, but that's only a single
module.
Why have your own template engine instead of using the normal rsyslog
template engine?
Primarily I was considering the use case of modifying $!all-json - i.e.
store any incoming record in full, but add an "incoming host name"
(overriding any "incoming host name" value in the record). I don't
think this can be done with pure textual substitution without
understanding the field structure. (This is not specific to mongodb -
using a textual long file in JSON format would require similar
understanding of the field data.) Or perhaps this kind of functionality
isn't useful, or it could be done in a different way?
Why should you override whatever the sysadmin has configured? they may
want to add the "incoming host name" to the record and they may not. Why
should your om always log something instead of being like every other om
and logging what the sysadmin tells you to log?
Secondarily, this allows building the mongodb inputs without having to
do a fairly expensive field list -> JSON data -> JSON text -> JSON data
-> BSON roundtrip, but that is a secondary concern.
I'm not sure I really understand the idea of reading log files from a
database as an input source. But in any case, why should anything related
to input have an effect on outputs?
note that JSON data is always text, BSON has binary representations of
fields, but not JSON.
Using the raw "sequence of fields without any formatting" format is not
great, I agree - but then pretending that the template can be an
arbitrary JSON format and we parse it intelligently is not great either.
However that's definitely open to a change.
I thought I saw you saying that you wanted to send JSON to the database.
If that is the case, then let the sysadmin create the JSON and insert
that.
If you are wanting something other than JSON to put into the database,
it's still probably better for the sysadmin to be able to specify things
and you just do what the user wants.
It's common for people to want to have rules along the lines of
(not rsyslog syntax)
if sourceip in $list then set tag X in output log to a fixed value
if you end up creating your own template language
The users are already familar with, and that gets extended to cover new
things without you having to duplicate the effort. This includes the
ability to dump all properties out as JSON and is going to include
abilities to modify the fields and leave some out of the output in the
future (as part of the entire lumberjack related effort)
This sounds interesting, is there any code I could look at?
start by looking at the existing json formatting options for templates,
but the key is that every other om uses a text string to pass the log data
from rsyslog to the module, some of them interpret parts of the string and
strip parts off before sending the remainder to the destination (the UDP
forgery module for example), others take the data passed in to them and
wrap it in other needed stuff to send it out (the email module for
example), but I don't know of any that ignore the text string passed to
them and create the message directly from the properties.
- What to do about non-string values? mongodb recognizes different
types, and it would be good to use the native one (so that numbers
instead of strings could be compared - note that there is no automatic
type conversion when comparing different types in mongodb). Non-string
types can be handled on output (by adding
"format-as-int"/"format-as-date" options), but AFAICS on input /
mmjsonparse everything is treated as strings. Is it at this point
realistic to think about preserving the type of data as presented on
input while reformatting it (e.g. by using mmjsonparse, and a template
with $!all-json and some of the above-mentioned field "edits")? Or is
rsyslog so fundamentally based on strings that this would take too much
work? (There is always the option to preserve types simply by treating
the JSON as unmodified plaintext).
Remember that the input to rsyslog is strings to start with (with the
exception of internally generated metadata). To get it to be anything
other than a string is going to require converting it. There's nothing
preventing you from getting a JSON string from rsyslog and optimizing it
by converting the data from strings to a more compact format for storage.
Output modules to transport the data from system to system via JSON will
be doing exactly the same thing.
The case I was thinking about is the same as above - keep the original
JSON, but add or modify a little.
So, if the input event is
@cee: {"field1": "string", "field2": 5.0, "field3": [1,2,3]}
I would like the data stored in MongoDB to directly correspond, e.g.
{"host_name":"server1", "field1": "string", "field2": 5.0, "field3": [1,2,3]}
not modify it to, say,
{"host_name":"server1", "field1": "string", "field2": "5.0", "field3":
"[1,2,3]"}
and what makes you think that rsyslog is going to change it instead of
keeping it the same?
however, I suspect that the incoming @cee formatted message is going to
have all fields quoted.
MongoDB documentation seems to suggest that it doesn't support comparing
mixed field types much, so changing the value types sounds undesirable.
you don't need to have your own template language to do this, let the
sysadmin specify the format (go ahead and have a default format if the
sysadmin doesn't specify one).
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards