I remember seeing MaxMessageSize parameter, but I thought it was a legacy configuration per http://www.rsyslog.com/doc/v8-stable/configuration/global/index.html. But I realize now that it's legacy format, not configuration.
I didn't set this parameter and it's not in rsyslog.conf, so I per the same document, the default is 8K. How could Elasticsearch then complain about receiving a much larger message 50K which started this whole discussion? Thanks, Alec On Thu, Jul 7, 2016 at 3:46 PM, David Lang <[email protected]> wrote: > Per Rainer, the 10k limit is in the normalizer tool, not in the library. > In rsyslog there is a maxmessagesize parameter. That is the limiting factor > even with the existing version. > > the new version of liblognorm/mmnormalize will work with existing version > 1 rulesets. > > > David Lang > > On Thu, 7 Jul 2016, Alec Swan wrote: > > So, does this mean that with the new rsyslog 8.20 and new liblognorm >> version I will still be able to continue using "version 1" in my rules and >> not run into 10K limit? Or would I have to switch to version 2? >> >> Thanks, >> >> Alec >> >> On Thu, Jul 7, 2016 at 1:31 PM, David Lang <[email protected]> wrote: >> >> As I understand Rainer's reply, if you compile from the current liblognorm >>> master you will not have that limitation. This new branch will be >>> released >>> shortly (along with rsyslog 8.20) >>> >>> David Lang >>> >>> On Thu, 7 Jul 2016, Alec Swan wrote: >>> >>> Date: Thu, 7 Jul 2016 12:57:51 -0600 >>> >>>> >>>> From: Alec Swan <[email protected]> >>>> Reply-To: rsyslog-users <[email protected]> >>>> To: rsyslog-users <[email protected]> >>>> Subject: Re: [rsyslog] Invalid JSON from >>>> mmnormalize/liblognorm/omelasticsearch >>>> >>>> The test I ran was using lognormalizer as shown below. So, I wasn't >>>> using >>>> it with the normalizer tool included in rsyslog distribution. The test >>>> was >>>> able to parse mylog.log under 10K and returned "2 unparsable entries" >>>> when >>>> mylog.log was over 10K. Is there a way to increase this limit so that I >>>> can >>>> process messages larger than 10K with rsyslog 8.19.0? >>>> >>>> This is the test I ran: >>>> lognormalizer -U -r myrule.rb < mylog.log >>>> >>>> This is the content of myrule.rb: >>>> version=1 >>>> rule=:%message:rest% >>>> >>>> This is the output from the test when mylog.log is over 10K: >>>> { "originalmsg": "_THE_FIRST_10K_OF_TEXT_", "unparsed-data": >>>> "_THE_FIRST_10K_OF_TEXT_"} >>>> { "originalmsg": _REMAINING_TEXT_", "unparsed-data": >>>> "_REMAINING_TEXT_" >>>> } >>>> 2 unparsable entries >>>> >>>> >>>> Thanks, >>>> >>>> Alec >>>> >>>> On Thu, Jul 7, 2016 at 5:23 AM, Rainer Gerhards < >>>> [email protected] >>>> > >>>> wrote: >>>> >>>> 2016-07-07 3:54 GMT+02:00 David Lang <[email protected]>: >>>> >>>>> > first I've heard of this, we'll need to have Rainer comment on this. >>>>> >>>>> there is no such 10k limit in liblognorm. HOWEVER, the normalizer tool >>>>> that comes with it had such a limit until current master branch. >>>>> >>>>> If used with rsyslog, I would assume that the max message size is set >>>>> to >>>>> 10k. >>>>> >>>>> HTH >>>>> Rainer >>>>> > >>>>> > David Lang >>>>> > >>>>> > On Wed, 6 Jul 2016, Alec Swan wrote: >>>>> > >>>>> >> Date: Wed, 6 Jul 2016 15:34:44 -0600 >>>>> >> From: Alec Swan <[email protected]> >>>>> >> Reply-To: rsyslog-users <[email protected]> >>>>> >> To: rsyslog-users <[email protected]> >>>>> >> >>>>> >> Subject: Re: [rsyslog] Invalid JSON from >>>>> >> mmnormalize/liblognorm/omelasticsearch >>>>> >> >>>>> >> Dave, I tried using liblognorm to parse the log message and it looks >>>>> like >>>>> >> %rest% liblognorm type can only match up to 10240 characters. So, >>>>> for >>>>> >> example the following rule succeeds parsing 10239 character message, >>>>> but >>>>> >> fails with 10240. >>>>> >> >>>>> >> rule=:%message:rest% >>>>> >> >>>>> >> The particular log file I am parsing contains enormous log messages, >>>>> e.g. >>>>> >> 180,000 characters in a single line. So, is there really a limit on >>>>> 10240 >>>>> >> characters in liblognorm? If so, what's the recommended way to >>>>> handle >>>>> >> parsing of extremely large messages? >>>>> >> >>>>> >> Thanks, >>>>> >> >>>>> >> Alec >>>>> >> >>>>> >> On Wed, Jun 29, 2016 at 6:09 PM, David Lang <[email protected]> wrote: >>>>> >> >>>>> >>> This is helping narrow things down. >>>>> >>> >>>>> >>> I would have rsyslog write to a file with the template that you use >>>>> to >>>>> >>> send to elasticsearch. >>>>> >>> >>>>> >>> I would also use the liblognorm command-line tool to parse the file >>>>> and >>>>> >>> output json. >>>>> >>> >>>>> >>> let's try to see where it breaks. >>>>> >>> >>>>> >>> David Lang >>>>> >>> >>>>> >>> On Wed, 29 Jun 2016, Alec Swan wrote: >>>>> >>> >>>>> >>> David, as you suggested, I extracted the log lines containing Hindi >>>>> >>>> >>>>> >>>> characters in a separate file and ran "file -bi" which returned >>>>> >>>> "text/plain; charset=utf-8". Which confirms that logs are written >>>>> in >>>>> >>>> UTF-8. >>>>> >>>> Any thoughts what would cause rsyslog to send messages like >>>>> >>>> "\u00E0\u00.4??? >>>>> >>>> >>>>> >>>> Description in Hindi" causing Elasticsearch to throw an exception? >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> >>>>> >>>> Alec >>>>> >>>> >>>>> >>>> On Wed, Jun 29, 2016 at 4:08 PM, alecswan <[email protected]> >>>>> wrote: >>>>> >>>> >>>>> >>>> I looked at the code that produces this log file and it's writing >>>>> the >>>>> >>>> log >>>>> >>>>> >>>>> >>>>> with utf-8 encoding. What else could cause this problem? Could it >>>>> be >>>>> >>>>> that >>>>> >>>>> Hindi characters may require 3 bytes for encoding? Just grasping >>>>> at >>>>> >>>>> straws >>>>> >>>>> here ... >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> >>>>> Alec >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------- Original message -------- >>>>> >>>>> From: David Lang >>>>> >>>>> Date:29/06/2016 2:00 PM (GMT-07:00) >>>>> >>>>> To: rsyslog-users >>>>> >>>>> Subject: Re: [rsyslog] Invalid JSON from >>>>> >>>>> mmnormalize/liblognorm/omelasticsearch >>>>> >>>>> >>>>> >>>>> On Wed, 29 Jun 2016, Alec Swan wrote: >>>>> >>>>> >>>>> >>>>> > I tried using mmutf8fix as shown below, but it didn't seem to >>>>> fix >>>>> the >>>>> >>>>> > problem. What I am doing is monitoring a log file with imfile >>>>> action, >>>>> >>>>> > parsing it with mmnormalize and sending JSON to Elasticsearch >>>>> with >>>>> >>>>> > omelasticsearch. >>>>> >>>>> > >>>>> >>>>> > I check the encoding of the log file using "file -bi" and it >>>>> says >>>>> >>>>> > "text/plain; charset=us-ascii". >>>>> >>>>> >>>>> >>>>> > However, it contains some Hindi characters, which I assume are >>>>> >>>>> > encoded >>>>> >>>>> with >>>>> >>>>> > us-ascii. >>>>> >>>>> >>>>> >>>>> There is no way to encode Hindi characters as us-ascii. us-ascii >>>>> is >>>>> the >>>>> >>>>> most >>>>> >>>>> basic character set, English uppper case, lower case and >>>>> punctuation >>>>> >>>>> only. >>>>> >>>>> >>>>> >>>>> So whatever character set it is in, it's not us-ascii >>>>> >>>>> >>>>> >>>>> > If I understand correctly, >>>>> >>>>> > us-ascii is a subset of UTF-8. If this is the case, do I really >>>>> need >>>>> >>>>> > to >>>>> >>>>> us >>>>> >>>>> > mmutf8fix? >>>>> >>>>> >>>>> >>>>> It all depends on what character set it's actually in. try >>>>> making a >>>>> >>>>> copy >>>>> >>>>> of the >>>>> >>>>> file that has the Hindi characters near the beginning of it and >>>>> try >>>>> the >>>>> >>>>> file -bi >>>>> >>>>> again, see if it gives a more accurate answer. >>>>> >>>>> >>>>> >>>>> otherwise, you will have to track down what's writing the >>>>> messages >>>>> and >>>>> >>>>> try >>>>> >>>>> to >>>>> >>>>> set the character set there (or at least find out what character >>>>> set >>>>> >>>>> it's >>>>> >>>>> using) >>>>> >>>>> >>>>> >>>>> David Lang >>>>> >>>>> >>>>> >>>>> > To me it seems like the Hindi characters are UTF-8 encoded with >>>>> >>>>> > 3-byte >>>>> >>>>> > sequences and when they are received by Elasticsearch the byte >>>>> >>>>> > sequence >>>>> >>>>> is >>>>> >>>>> > incorrectly decoded to invalid Unicode sequence, such as >>>>> "\u00.4". >>>>> Is >>>>> >>>>> this >>>>> >>>>> > plausible? >>>>> >>>>> > >>>>> >>>>> > module(load = "imfile") >>>>> >>>>> > module(load="mmutf8fix") >>>>> >>>>> > module(load = "mmnormalize") >>>>> >>>>> > module(load = "omelasticsearch") >>>>> >>>>> > >>>>> >>>>> > input(type = "imfile" Ruleset="X" ...) >>>>> >>>>> > ruleset(name = "X") { >>>>> >>>>> > action(type="mmutf8fix") >>>>> >>>>> > action(type = "mmnormalize" ...) >>>>> >>>>> > action(type = "omelasticsearch" ...) >>>>> >>>>> > } >>>>> >>>>> > >>>>> >>>>> > Thanks, >>>>> >>>>> > >>>>> >>>>> > Alec >>>>> >>>>> > >>>>> >>>>> > On Tue, Jun 28, 2016 at 4:49 PM, Alec Swan <[email protected] >>>>> > >>>>> >>>>> > wrote: >>>>> >>>>> > >>>>> >>>>> >> Thanks for the suggestion, Dave. I noticed that on the client >>>>> side >>>>> >>>>> the >>>>> >>>>> >> log contained Hindi characters that got translated to >>>>> >>>>> "\u00E0\u00.4???\" >>>>> >>>>> >> which eventually caused the error. I'll give mmutf8fix plugin >>>>> a >>>>> try. >>>>> >>>>> >> >>>>> >>>>> >> Thanks, >>>>> >>>>> >> >>>>> >>>>> >> Alec >>>>> >>>>> >> >>>>> >>>>> >> On Tue, Jun 28, 2016 at 3:24 PM, Dave Caplinger < >>>>> >>>>> >> [email protected]> wrote: >>>>> >>>>> >> >>>>> >>>>> >>> On Jun 28, 2016, at 4:04 PM, Alec Swan <[email protected]> >>>>> wrote: >>>>> >>>>> >>> > >>>>> >>>>> >>> > I think the root cause of the problem is that there is an >>>>> invalid >>>>> >>>>> UTF-8 >>>>> >>>>> >>> > sequence "\u00.4" in the value if the "message" field. In >>>>> fact, I >>>>> >>>>> just >>>>> >>>>> >>> > confirmed that {"message":"\u00.4"} is not a valid JSON on >>>>> >>>>> >>> > http://jsonlint.com/. >>>>> >>>>> >>> >>>>> >>>>> >>> I've run into something similar where the original message >>>>> source >>>>> >>>>> >>> was >>>>> >>>>> >>> sending Windows-1252 or other character set. Rsyslog doesn't >>>>> know >>>>> >>>>> the >>>>> >>>>> >>> incoming character set, so it doesn't know that it needs to >>>>> be >>>>> >>>>> converted to >>>>> >>>>> >>> UTF-8. (That particular input would receive logs from various >>>>> >>>>> sources, >>>>> >>>>> so >>>>> >>>>> >>> the character set could vary per message). >>>>> >>>>> >>> >>>>> >>>>> >>> The fix we used was to add action(type="mmutf8fix") to the >>>>> affected >>>>> >>>>> >>> ruleset prior to any JSON template use. This isn't strictly >>>>> >>>>> >>> accurate >>>>> >>>>> >>> because you lose the 'invalid' character in the resulting >>>>> string, >>>>> >>>>> >>> but >>>>> >>>>> at >>>>> >>>>> >>> least that string is JSON-safe. In the ideal case you'd know >>>>> what >>>>> >>>>> the >>>>> >>>>> >>> original character set was and explicitly convert it UTF-8, >>>>> but >>>>> >>>>> >>> that >>>>> >>>>> wasn't >>>>> >>>>> >>> practical in our use case. >>>>> >>>>> >>> >>>>> >>>>> >>> -- >>>>> >>>>> >>> Dave Caplinger | Director, Technical Product Management >>>>> >>>>> >>> Solutionary — An NTT Group Security Company >>>>> >>>>> >>> >>>>> >>>>> >>> _______________________________________________ >>>>> >>>>> >>> rsyslog mailing list >>>>> >>>>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> >>>>> >>> http://www.rsyslog.com/professional-services/ >>>>> >>>>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> >>>>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED >>>>> by a >>>>> >>>>> myriad >>>>> >>>>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>> POST >>>>> if >>>>> >>>>> you >>>>> >>>>> >>> DON'T LIKE THAT. >>>>> >>>>> >>> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> > _______________________________________________ >>>>> >>>>> > rsyslog mailing list >>>>> >>>>> > http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> >>>>> > http://www.rsyslog.com/professional-services/ >>>>> >>>>> > What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> >>>>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED >>>>> by a >>>>> >>>>> myriad >>>>> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>> if >>>>> you >>>>> >>>>> DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>>> >>>>> rsyslog mailing list >>>>> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> >>>>> http://www.rsyslog.com/professional-services/ >>>>> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> >>>>> myriad >>>>> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>> if >>>>> you >>>>> >>>>> DON'T LIKE THAT. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>> >>>>> >>>> rsyslog mailing list >>>>> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> >>>> http://www.rsyslog.com/professional-services/ >>>>> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>> you >>>>> >>>> DON'T LIKE THAT. >>>>> >>>> >>>>> >>> >>>>> >>> _______________________________________________ >>>>> >>> rsyslog mailing list >>>>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> >>> http://www.rsyslog.com/professional-services/ >>>>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>> you >>>>> >>> DON'T LIKE THAT. >>>>> >>> >>>>> >> _______________________________________________ >>>>> >> rsyslog mailing list >>>>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> >> http://www.rsyslog.com/professional-services/ >>>>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>> you >>>>> DON'T >>>>> >> LIKE THAT. >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > rsyslog mailing list >>>>> > http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> > http://www.rsyslog.com/professional-services/ >>>>> > What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of >>>>> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T >>>>> > LIKE THAT. >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>>> >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

