I remember seeing MaxMessageSize parameter, but I thought it was a legacy
configuration per
http://www.rsyslog.com/doc/v8-stable/configuration/global/index.html. But I
realize now that it's legacy format, not configuration.

I didn't set this parameter and it's not in rsyslog.conf, so I per the same
document, the default is 8K. How could Elasticsearch then complain about
receiving a much larger message 50K which started this whole discussion?

Thanks,

Alec

On Thu, Jul 7, 2016 at 3:46 PM, David Lang <[email protected]> wrote:

> Per Rainer, the 10k limit is in the normalizer tool, not in the library.
> In rsyslog there is a maxmessagesize parameter. That is the limiting factor
> even with the existing version.
>
> the new version of liblognorm/mmnormalize will work with existing version
> 1 rulesets.
>
>
> David Lang
>
> On Thu, 7 Jul 2016, Alec Swan wrote:
>
> So, does this mean that with the new rsyslog 8.20 and new liblognorm
>> version I will still be able to continue using "version 1" in my rules and
>> not run into 10K limit? Or would I have to switch to version 2?
>>
>> Thanks,
>>
>> Alec
>>
>> On Thu, Jul 7, 2016 at 1:31 PM, David Lang <[email protected]> wrote:
>>
>> As I understand Rainer's reply, if you compile from the current liblognorm
>>> master you will not have that limitation. This new branch will be
>>> released
>>> shortly (along with rsyslog 8.20)
>>>
>>> David Lang
>>>
>>> On Thu, 7 Jul 2016, Alec Swan wrote:
>>>
>>> Date: Thu, 7 Jul 2016 12:57:51 -0600
>>>
>>>>
>>>> From: Alec Swan <[email protected]>
>>>> Reply-To: rsyslog-users <[email protected]>
>>>> To: rsyslog-users <[email protected]>
>>>> Subject: Re: [rsyslog] Invalid JSON from
>>>> mmnormalize/liblognorm/omelasticsearch
>>>>
>>>> The test I ran was using lognormalizer as shown below. So, I wasn't
>>>> using
>>>> it with the normalizer tool included in rsyslog distribution. The test
>>>> was
>>>> able to parse mylog.log under 10K and returned "2 unparsable entries"
>>>> when
>>>> mylog.log was over 10K. Is there a way to increase this limit so that I
>>>> can
>>>> process messages larger than 10K with rsyslog 8.19.0?
>>>>
>>>> This is the test I ran:
>>>>  lognormalizer -U -r myrule.rb < mylog.log
>>>>
>>>> This is the content of myrule.rb:
>>>>  version=1
>>>>  rule=:%message:rest%
>>>>
>>>> This is the output from the test when mylog.log is over 10K:
>>>>  { "originalmsg": "_THE_FIRST_10K_OF_TEXT_", "unparsed-data":
>>>> "_THE_FIRST_10K_OF_TEXT_"}
>>>>  { "originalmsg":  _REMAINING_TEXT_", "unparsed-data":
>>>> "_REMAINING_TEXT_"
>>>> }
>>>>  2 unparsable entries
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Alec
>>>>
>>>> On Thu, Jul 7, 2016 at 5:23 AM, Rainer Gerhards <
>>>> [email protected]
>>>> >
>>>> wrote:
>>>>
>>>> 2016-07-07 3:54 GMT+02:00 David Lang <[email protected]>:
>>>>
>>>>> > first I've heard of this, we'll need to have Rainer comment on this.
>>>>>
>>>>> there is no such 10k limit in liblognorm. HOWEVER, the normalizer tool
>>>>> that comes with it had such a limit until current master branch.
>>>>>
>>>>> If used with rsyslog, I would assume that the max message size is set
>>>>> to
>>>>> 10k.
>>>>>
>>>>> HTH
>>>>> Rainer
>>>>> >
>>>>> > David Lang
>>>>> >
>>>>> > On Wed, 6 Jul 2016, Alec Swan wrote:
>>>>> >
>>>>> >> Date: Wed, 6 Jul 2016 15:34:44 -0600
>>>>> >> From: Alec Swan <[email protected]>
>>>>> >> Reply-To: rsyslog-users <[email protected]>
>>>>> >> To: rsyslog-users <[email protected]>
>>>>> >>
>>>>> >> Subject: Re: [rsyslog] Invalid JSON from
>>>>> >> mmnormalize/liblognorm/omelasticsearch
>>>>> >>
>>>>> >> Dave, I tried using liblognorm to parse the log message and it looks
>>>>> like
>>>>> >> %rest% liblognorm type can only match up to 10240 characters. So,
>>>>> for
>>>>> >> example the following rule succeeds parsing 10239 character message,
>>>>> but
>>>>> >> fails with 10240.
>>>>> >>
>>>>> >> rule=:%message:rest%
>>>>> >>
>>>>> >> The particular log file I am parsing contains enormous log messages,
>>>>> e.g.
>>>>> >> 180,000 characters in a single line. So, is there really a limit on
>>>>> 10240
>>>>> >> characters in liblognorm? If so, what's the recommended way to
>>>>> handle
>>>>> >> parsing of extremely large messages?
>>>>> >>
>>>>> >> Thanks,
>>>>> >>
>>>>> >> Alec
>>>>> >>
>>>>> >> On Wed, Jun 29, 2016 at 6:09 PM, David Lang <[email protected]> wrote:
>>>>> >>
>>>>> >>> This is helping narrow things down.
>>>>> >>>
>>>>> >>> I would have rsyslog write to a file with the template that you use
>>>>> to
>>>>> >>> send to elasticsearch.
>>>>> >>>
>>>>> >>> I would also use the liblognorm command-line tool to parse the file
>>>>> and
>>>>> >>> output json.
>>>>> >>>
>>>>> >>> let's try to see where it breaks.
>>>>> >>>
>>>>> >>> David Lang
>>>>> >>>
>>>>> >>> On Wed, 29 Jun 2016, Alec Swan wrote:
>>>>> >>>
>>>>> >>> David, as you suggested, I extracted the log lines containing Hindi
>>>>> >>>>
>>>>> >>>> characters in a separate file and ran "file -bi" which returned
>>>>> >>>> "text/plain; charset=utf-8". Which confirms that logs are written
>>>>> in
>>>>> >>>> UTF-8.
>>>>> >>>> Any thoughts what would cause rsyslog to send messages like
>>>>> >>>> "\u00E0\u00.4???
>>>>> >>>>
>>>>> >>>> Description in Hindi" causing Elasticsearch to throw an exception?
>>>>> >>>>
>>>>> >>>> Thanks,
>>>>> >>>>
>>>>> >>>> Alec
>>>>> >>>>
>>>>> >>>> On Wed, Jun 29, 2016 at 4:08 PM, alecswan <[email protected]>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> I looked at the code that produces this log file and it's writing
>>>>> the
>>>>> >>>> log
>>>>> >>>>>
>>>>> >>>>> with utf-8 encoding. What else could cause this problem? Could it
>>>>> be
>>>>> >>>>> that
>>>>> >>>>> Hindi characters may require 3 bytes for encoding? Just grasping
>>>>> at
>>>>> >>>>> straws
>>>>> >>>>> here ...
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>>
>>>>> >>>>> Alec
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> -------- Original message --------
>>>>> >>>>> From: David Lang
>>>>> >>>>> Date:29/06/2016 2:00 PM (GMT-07:00)
>>>>> >>>>> To: rsyslog-users
>>>>> >>>>> Subject: Re: [rsyslog] Invalid JSON from
>>>>> >>>>> mmnormalize/liblognorm/omelasticsearch
>>>>> >>>>>
>>>>> >>>>> On Wed, 29 Jun 2016, Alec Swan wrote:
>>>>> >>>>>
>>>>> >>>>> > I tried using mmutf8fix as shown below, but it didn't seem to
>>>>> fix
>>>>> the
>>>>> >>>>> > problem. What I am doing is monitoring a log file with imfile
>>>>> action,
>>>>> >>>>> > parsing it with mmnormalize and sending JSON to Elasticsearch
>>>>> with
>>>>> >>>>> > omelasticsearch.
>>>>> >>>>> >
>>>>> >>>>> > I check the encoding of the log file using "file -bi" and it
>>>>> says
>>>>> >>>>> > "text/plain; charset=us-ascii".
>>>>> >>>>>
>>>>> >>>>> > However, it contains some Hindi characters, which I assume are
>>>>> >>>>> > encoded
>>>>> >>>>> with
>>>>> >>>>> > us-ascii.
>>>>> >>>>>
>>>>> >>>>> There is no way to encode Hindi characters as us-ascii. us-ascii
>>>>> is
>>>>> the
>>>>> >>>>> most
>>>>> >>>>> basic character set, English uppper case, lower case and
>>>>> punctuation
>>>>> >>>>> only.
>>>>> >>>>>
>>>>> >>>>> So whatever character set it is in, it's not us-ascii
>>>>> >>>>>
>>>>> >>>>> > If I understand correctly,
>>>>> >>>>> > us-ascii is a subset of UTF-8. If this is the case, do I really
>>>>> need
>>>>> >>>>> > to
>>>>> >>>>> us
>>>>> >>>>> > mmutf8fix?
>>>>> >>>>>
>>>>> >>>>> It all depends on what character set it's actually in. try
>>>>> making a
>>>>> >>>>> copy
>>>>> >>>>> of the
>>>>> >>>>> file that has the Hindi characters near the beginning of it and
>>>>> try
>>>>> the
>>>>> >>>>> file -bi
>>>>> >>>>> again, see if it gives a more accurate answer.
>>>>> >>>>>
>>>>> >>>>> otherwise, you will have to track down what's writing the
>>>>> messages
>>>>> and
>>>>> >>>>> try
>>>>> >>>>> to
>>>>> >>>>> set the character set there (or at least find out what character
>>>>> set
>>>>> >>>>> it's
>>>>> >>>>> using)
>>>>> >>>>>
>>>>> >>>>> David Lang
>>>>> >>>>>
>>>>> >>>>> > To me it seems like the Hindi characters are UTF-8 encoded with
>>>>> >>>>> > 3-byte
>>>>> >>>>> > sequences and when they are received by Elasticsearch the byte
>>>>> >>>>> > sequence
>>>>> >>>>> is
>>>>> >>>>> > incorrectly decoded to invalid Unicode sequence, such as
>>>>> "\u00.4".
>>>>> Is
>>>>> >>>>> this
>>>>> >>>>> > plausible?
>>>>> >>>>> >
>>>>> >>>>> > module(load = "imfile")
>>>>> >>>>> > module(load="mmutf8fix")
>>>>> >>>>> > module(load = "mmnormalize")
>>>>> >>>>> > module(load = "omelasticsearch")
>>>>> >>>>> >
>>>>> >>>>> > input(type = "imfile" Ruleset="X" ...)
>>>>> >>>>> > ruleset(name = "X") {
>>>>> >>>>> >  action(type="mmutf8fix")
>>>>> >>>>> >  action(type = "mmnormalize" ...)
>>>>> >>>>> >  action(type = "omelasticsearch" ...)
>>>>> >>>>> > }
>>>>> >>>>> >
>>>>> >>>>> > Thanks,
>>>>> >>>>> >
>>>>> >>>>> > Alec
>>>>> >>>>> >
>>>>> >>>>> > On Tue, Jun 28, 2016 at 4:49 PM, Alec Swan <[email protected]
>>>>> >
>>>>> >>>>> > wrote:
>>>>> >>>>> >
>>>>> >>>>> >> Thanks for the suggestion, Dave.  I noticed that on the client
>>>>> side
>>>>> >>>>> the
>>>>> >>>>> >> log contained Hindi characters that got translated to
>>>>> >>>>> "\u00E0\u00.4???\"
>>>>> >>>>> >> which eventually caused the error. I'll give mmutf8fix plugin
>>>>> a
>>>>> try.
>>>>> >>>>> >>
>>>>> >>>>> >> Thanks,
>>>>> >>>>> >>
>>>>> >>>>> >> Alec
>>>>> >>>>> >>
>>>>> >>>>> >> On Tue, Jun 28, 2016 at 3:24 PM, Dave Caplinger <
>>>>> >>>>> >> [email protected]> wrote:
>>>>> >>>>> >>
>>>>> >>>>> >>> On Jun 28, 2016, at 4:04 PM, Alec Swan <[email protected]>
>>>>> wrote:
>>>>> >>>>> >>> >
>>>>> >>>>> >>> > I think the root cause of the problem is that there is an
>>>>> invalid
>>>>> >>>>> UTF-8
>>>>> >>>>> >>> > sequence "\u00.4" in the value if the "message" field. In
>>>>> fact, I
>>>>> >>>>> just
>>>>> >>>>> >>> > confirmed that {"message":"\u00.4"} is not a valid JSON on
>>>>> >>>>> >>> > http://jsonlint.com/.
>>>>> >>>>> >>>
>>>>> >>>>> >>> I've run into something similar where the original message
>>>>> source
>>>>> >>>>> >>> was
>>>>> >>>>> >>> sending Windows-1252 or other character set.  Rsyslog doesn't
>>>>> know
>>>>> >>>>> the
>>>>> >>>>> >>> incoming character set, so it doesn't know that it needs to
>>>>> be
>>>>> >>>>> converted to
>>>>> >>>>> >>> UTF-8. (That particular input would receive logs from various
>>>>> >>>>> sources,
>>>>> >>>>> so
>>>>> >>>>> >>> the character set could vary per message).
>>>>> >>>>> >>>
>>>>> >>>>> >>> The fix we used was to add action(type="mmutf8fix") to the
>>>>> affected
>>>>> >>>>> >>> ruleset prior to any JSON template use.  This isn't strictly
>>>>> >>>>> >>> accurate
>>>>> >>>>> >>> because you lose the 'invalid' character in the resulting
>>>>> string,
>>>>> >>>>> >>> but
>>>>> >>>>> at
>>>>> >>>>> >>> least that string is JSON-safe.  In the ideal case you'd know
>>>>> what
>>>>> >>>>> the
>>>>> >>>>> >>> original character set was and explicitly convert it UTF-8,
>>>>> but
>>>>> >>>>> >>> that
>>>>> >>>>> wasn't
>>>>> >>>>> >>> practical in our use case.
>>>>> >>>>> >>>
>>>>> >>>>> >>> --
>>>>> >>>>> >>> Dave Caplinger | Director, Technical Product Management
>>>>> >>>>> >>> Solutionary — An NTT Group Security Company
>>>>> >>>>> >>>
>>>>> >>>>> >>> _______________________________________________
>>>>> >>>>> >>> rsyslog mailing list
>>>>> >>>>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >>>>> >>> http://www.rsyslog.com/professional-services/
>>>>> >>>>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >>>>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
>>>>> by a
>>>>> >>>>> myriad
>>>>> >>>>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>>> POST
>>>>> if
>>>>> >>>>> you
>>>>> >>>>> >>> DON'T LIKE THAT.
>>>>> >>>>> >>>
>>>>> >>>>> >>
>>>>> >>>>> >>
>>>>> >>>>> > _______________________________________________
>>>>> >>>>> > rsyslog mailing list
>>>>> >>>>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >>>>> > http://www.rsyslog.com/professional-services/
>>>>> >>>>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >>>>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
>>>>> by a
>>>>> >>>>> myriad
>>>>> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>> if
>>>>> you
>>>>> >>>>> DON'T LIKE THAT.
>>>>> >>>>> _______________________________________________
>>>>> >>>>> rsyslog mailing list
>>>>> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >>>>> http://www.rsyslog.com/professional-services/
>>>>> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> >>>>> myriad
>>>>> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>> if
>>>>> you
>>>>> >>>>> DON'T LIKE THAT.
>>>>> >>>>>
>>>>> >>>>> _______________________________________________
>>>>> >>>>
>>>>> >>>> rsyslog mailing list
>>>>> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >>>> http://www.rsyslog.com/professional-services/
>>>>> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>> you
>>>>> >>>> DON'T LIKE THAT.
>>>>> >>>>
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> rsyslog mailing list
>>>>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >>> http://www.rsyslog.com/professional-services/
>>>>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>> you
>>>>> >>> DON'T LIKE THAT.
>>>>> >>>
>>>>> >> _______________________________________________
>>>>> >> rsyslog mailing list
>>>>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >> http://www.rsyslog.com/professional-services/
>>>>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>> you
>>>>> DON'T
>>>>> >> LIKE THAT.
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > rsyslog mailing list
>>>>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> > http://www.rsyslog.com/professional-services/
>>>>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of
>>>>> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T
>>>>> > LIKE THAT.
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to