As a short term solution I'm working on a small service (in golang) that accepts logs over tcp, can replace characters in JSON field names in a @cee syslog line, and then forward the line to another syslog destination. In tests on my laptop it handles modifying ~ 50,000 reasonably sized log lines a second per connection. It gracefully handles tcp connection issues and I'll test it under adverse circumstances to make sure it's reasonably robust. I personally find this preferable to deploying logstash just to substitute one character. I'll release it open source this week in case any one else needs an immediate solution to this problem like I do.
It's less than ideal - ideally elasticsearch would support JSON rather than a subset of characters JSON allows - but it solves the immediate problem for us. Cheers, Brian On Sun, Dec 6, 2015 at 2:51 PM, David Lang <da...@lang.hm> wrote: > On Sat, 5 Dec 2015, Peter Portante wrote: > > On Sat, Dec 5, 2015 at 5:03 PM, David Lang <da...@lang.hm> wrote: >> >> we really need mmscrubnames or similar >>> >>> 1. change all names to lower case >>> 2. replace characters that rsyslog doesn't allow in names with something >>> 3. allow other characters to be added to the list to be replaced >>> 4. change names that are foo!bar into multi-layer structures >>> 5. handle the case where these changes create nultiple objects with the >>> same name (probably by appending a string until there are no longer >>> conflicts) >>> >>> #1 may be able to go away in a decade or so if we allow case sensitive >>> names as an option >>> >>> >> Don't we need to make this go away sooner than later? If rsyslog is the >> link in the chain that prevents someone from getting the key names they >> expect into ES, won't they find something else to replace that link? >> >> I have made available RPMs for EPEL 7 (which should work on RHEL 7 and >> CentOS 7)P, and Fedora 21, 22, and 23. Why not make the effort to find >> out >> what breaks, and put in a switch so that folks can opt-in to >> case-sensitive >> names in config files? I'd be happy to implement the switch, but would >> need help verifying existing configurations work. >> > > this will break some existing configs, won't it? If someone has something > that's assuming everything is squished to lower case, and it becomes case > sensitive, won't that break? > > We can add the new case sensitivity as an option quickly, but can't make > it the default for quite a while (a cycle or two of the enterprise distros) > > #2 needs to be done on the actual variable names, not just on the ES >>> output so that the variables can be accessed and manipulated in rsyslog >>> >>> >> Why do we need to do this? Is this because we need to reference them in >> the configuration files? If so, why not provide an escape syntax for the >> configuration file? >> >> Do we really want rsyslog in the position where it adds restrictions to >> the >> data handling pipeline because of how it operates? I think we all agree >> that an mmscrubnames module would be good to help put rsyslog in the >> position of transforming data from one source to another in the overall >> pipeline. >> > > AFAIK, JSON imposes no limits of field names, so any strange character (or > unicode character, or even control character) could be part of a field > name. And even if the JSON spec imposes some limits, do the libraries > impose such limits in practice? > > I don't think it makes sense to support all of this in rsyslog, I think > it's reasonable to impose something sane. Other log handling software does > this (for example, logstash doesn't allow '.' in the name, but also is case > insensitive :-) > > and finally, #4 is needed to allow the work-around for problems like ES >>> has. >>> >>> >> I am not sure I follow why this allows us to work-around problems like ES >> has. >> >> The dots in field names are confusing and ambiguous in ES because you can >> reference a hierarchical set of objects in the json objects indexed. So >> if >> one has a field name with dots in it in one document and another document >> in the index has a hierarchy with sub objects, then it is ambiguous which >> we are dealing with, if I understand the problem correctly. >> > > Ok, that explains why this is an issue, it makes sense. We have the same > problem with '!'. It's a problem in ES because it's a new requirement, > breaking existing input. > > But #4 would let us say that '.' is an illegal character, along with > control characters, anything above plain ASCII, and other punctuation > characters we don't allow and get them replaced by something we do allow. > > Folks can stay with ES 1.7 if they need the dots in names. >> > > not long term. > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.