Hello, It's been a couple years, so have been reading back through the mailing list on normalization threads, searching for updates for how to deal with '-' hyphen place holders for nulls when normalizing logs with large number of positional elements/values...
Any option to set a nullMarker="-", or would still need to write a mess of a rulebase to do this on a massive log streams, which values for any of the fields are null with - hyphen... Originally was thinking of larger scope where could use the feature on different constants/strings, single char, or alternate data types. The idea of an OR condition didnt go over well, but ability to set a NullMarker is better than a thousand rules for a single kind of log. Maybe both options make more sense at this point, null markers when is consistently always something like hyphen, or alternate types, when could be char to this or that, or perhaps a fall back to word type would be ok... For example, if the field is an IP (ipv4) or hostname (word), plus the word type would parse the hyphen instead of nullMarker. Alternatively parse with word type could help for other types like IPs, quoted-string or word, which would extend beyond the hyphen nulls seen in Apache or Proxy logs. When forwarding the events, if nulls get dropped, there would be problems with endpoints that need data with key values (json) or in format csv with certain number of values, even - or empty "" in order to map to predefined headers. Even though I like the idea of dropping fields that have the null value, it complicates aspect's of sending specific data structures downstream... Is NullMarker, or altType features already implemented or any plans? If not, it could be very powerful and would cover a variety of use cases. Any feedback or updated perspectives are greatly appreciated! Thanks, Kendall On Jan 28, 2015 6:10 AM, "David Lang" <[email protected]> wrote: On Wed, 28 Jan 2015, singh.janmejay wrote: I see what you are thinking of, but somethings that may be worth thinking > about before we decide: > > - Does it make sense for users to pack unrelated samples in the same > rulebase? > > There are 3 problems with this: > * The tree will become large, and back-tracking several unrelated > branches will be wasteful (a condition in ruleset which calls the action > will be much more efficient assuming tests is not very complex) > this isn't the case. the code doesn't go down through each line of code one at a time, instead the entire rulebase gets compiled into a parse tree and looking things up in it is O(1) (constant based on the length of the line) So it's actually much faster to have everything in one ruleset * The rulebase will be composed of several unrelated rules, making it > harder to read > you can separate them with blank lines, but having an include option to pull in multiple files would be good. * Multiple parse-trees may have to be maintained in order to satisfy > all combinations of nullMarker (eg. a non-leaf field, marked for > null-handling in one sample, but not marked for it in the other) (so > matching will become O(n) in number of combinations). So it is some > dev-work and little bit of perf-overhead. > if the nullMarker can be specified only for part of the file, I don't think it will hurt much - The alternative is to set nullMarker at top level in a rulebase (instead > of being able to change it for every sample). > > But then the flexibility is slightly lowered. > > - If we go with action level param, its useful in cases where one has > standard access-log format but load-balancer level always have some fields > (say upstream latency or upstream-ip) which app-layer access logs will not > have. > it also means that there is something that only works when liblognorm is used by rsyslog, leaving out other users. This can use the same rulebase with nullMarker in one case, and without > it in another. > I don't understand this. I see the nullMarker as being something that will apply to all logs from a given application. When would you want to sometimes parse with nullMarker and sometimes without? David Lang Thoughts? > > On Wed, Jan 28, 2015 at 11:13 AM, David Lang <[email protected]> wrote: > > I'm thinking that it needs to only apply to part of a ruleset. I can't see >> why you would use the same rulebase with different values overall, but I >> can easily see a rulebase that covers more than one type of logs needing >> different values for the different types of logs. >> >> remember that liblognorm is most effictive if it has one ruleset to cover >> everything you are looking at rather than doing other conditionals and >> then >> picking which rulset to use. >> >> David Lang >> >> >> On Wed, 28 Jan 2015, singh.janmejay wrote: >> >> I think action parameter is the most flexible place to have it at. >> Because >> >>> same rulebase can be used with different values. >>> >>> Either module or rulebase level param will be less flexible compared to >>> this. >>> >>> -- >>> Regards, >>> Janmejay >>> >>> PS: Please blame the typos in this mail on my phone's uncivilized soft >>> keyboard sporting it's not-so-smart-assist technology. >>> >>> On Jan 28, 2015 10:48 AM, "David Lang" <[email protected]> wrote: >>> >>> On Wed, 28 Jan 2015, singh.janmejay wrote: >>> >>>> >>>> Ok, one way I can think of doing it: expose a parameter at >>>> action/module >>>> >>>> level which turns on defaulting and picks a default string. >>>>> >>>>> Eg. >>>>> >>>>> action(type="mmnormalize " nullMarker="-") >>>>> >>>>> Where nullMarker is a string (not a char). >>>>> >>>>> Whenever a "-" is encountered and a field is expected, it should skip >>>>> the >>>>> key(the key will not be present at all) and continue matching next >>>>> token >>>>> onwards. >>>>> >>>>> Thoughts? >>>>> >>>>> >>>>> This needs to be something in the liblognorm config, not in rsyslog. >>>> different types of logs would have different nullMarker strings. >>>> >>>> with that adjustment, I think it's a good idea. >>>> >>>> David Lang >>>> >>>> -- >>>> >>>> Regards, >>>>> Janmejay >>>>> >>>>> PS: Please blame the typos in this mail on my phone's uncivilized soft >>>>> keyboard sporting it's not-so-smart-assist technology. >>>>> >>>>> On Jan 28, 2015 6:38 AM, "David Lang" <[email protected]> wrote: >>>>> >>>>> On Wed, 28 Jan 2015, singh.janmejay wrote: >>>>> >>>>> >>>>>> May be it'll be useful to discuss what you want to achieve with such >>>>>> >>>>>> representations of sample. I mean if possible, take a few samples >>>>>> from >>>>>> >>>>>>> your >>>>>>> existing rulebase which you think highlight the problem(s) you are >>>>>>> facing. >>>>>>> >>>>>>> >>>>>>> I think the example is the Apache logs, where Apache either puts a >>>>>>> >>>>>> value, >>>>>> or it puts a placeholder '-' >>>>>> >>>>>> if you want to capture a specific type (number or ip address for >>>>>> example), >>>>>> you won't match a log entry that has a - in that field. >>>>>> >>>>>> If there are only a couple fields that are like this, you can list all >>>>>> the >>>>>> combinations in the ruleset, but if you have a lot of fields like >>>>>> this, >>>>>> the >>>>>> combinatorial explosion would make for a LOT of rules. >>>>>> >>>>>> So I don't think he really needs a generic 'or' allowing any types to >>>>>> be >>>>>> combined as much as a way to say "this field could be this type or >>>>>> this >>>>>> constant" >>>>>> >>>>>> David Lang >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>>> _______________________________________________ >>>>>> >>>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>>> >>>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>>> _______________________________________________ >>>> >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >>> _______________________________________________ >>> >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> >> > > > > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

