Per Rainer, the 10k limit is in the normalizer tool, not in the library. In rsyslog there is a maxmessagesize parameter. That is the limiting factor even with the existing version.

the new version of liblognorm/mmnormalize will work with existing version 1 rulesets.

David Lang

On Thu, 7 Jul 2016, Alec Swan wrote:

So, does this mean that with the new rsyslog 8.20 and new liblognorm
version I will still be able to continue using "version 1" in my rules and
not run into 10K limit? Or would I have to switch to version 2?

Thanks,

Alec

On Thu, Jul 7, 2016 at 1:31 PM, David Lang <[email protected]> wrote:

As I understand Rainer's reply, if you compile from the current liblognorm
master you will not have that limitation. This new branch will be released
shortly (along with rsyslog 8.20)

David Lang

On Thu, 7 Jul 2016, Alec Swan wrote:

Date: Thu, 7 Jul 2016 12:57:51 -0600

From: Alec Swan <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] Invalid JSON from
mmnormalize/liblognorm/omelasticsearch

The test I ran was using lognormalizer as shown below. So, I wasn't using
it with the normalizer tool included in rsyslog distribution. The test was
able to parse mylog.log under 10K and returned "2 unparsable entries" when
mylog.log was over 10K. Is there a way to increase this limit so that I
can
process messages larger than 10K with rsyslog 8.19.0?

This is the test I ran:
 lognormalizer -U -r myrule.rb < mylog.log

This is the content of myrule.rb:
 version=1
 rule=:%message:rest%

This is the output from the test when mylog.log is over 10K:
 { "originalmsg": "_THE_FIRST_10K_OF_TEXT_", "unparsed-data":
"_THE_FIRST_10K_OF_TEXT_"}
 { "originalmsg":  _REMAINING_TEXT_", "unparsed-data": "_REMAINING_TEXT_"
}
 2 unparsable entries


Thanks,

Alec

On Thu, Jul 7, 2016 at 5:23 AM, Rainer Gerhards <[email protected]
>
wrote:

2016-07-07 3:54 GMT+02:00 David Lang <[email protected]>:
> first I've heard of this, we'll need to have Rainer comment on this.

there is no such 10k limit in liblognorm. HOWEVER, the normalizer tool
that comes with it had such a limit until current master branch.

If used with rsyslog, I would assume that the max message size is set to
10k.

HTH
Rainer
>
> David Lang
>
> On Wed, 6 Jul 2016, Alec Swan wrote:
>
>> Date: Wed, 6 Jul 2016 15:34:44 -0600
>> From: Alec Swan <[email protected]>
>> Reply-To: rsyslog-users <[email protected]>
>> To: rsyslog-users <[email protected]>
>>
>> Subject: Re: [rsyslog] Invalid JSON from
>> mmnormalize/liblognorm/omelasticsearch
>>
>> Dave, I tried using liblognorm to parse the log message and it looks
like
>> %rest% liblognorm type can only match up to 10240 characters. So, for
>> example the following rule succeeds parsing 10239 character message,
but
>> fails with 10240.
>>
>> rule=:%message:rest%
>>
>> The particular log file I am parsing contains enormous log messages,
e.g.
>> 180,000 characters in a single line. So, is there really a limit on
10240
>> characters in liblognorm? If so, what's the recommended way to handle
>> parsing of extremely large messages?
>>
>> Thanks,
>>
>> Alec
>>
>> On Wed, Jun 29, 2016 at 6:09 PM, David Lang <[email protected]> wrote:
>>
>>> This is helping narrow things down.
>>>
>>> I would have rsyslog write to a file with the template that you use
to
>>> send to elasticsearch.
>>>
>>> I would also use the liblognorm command-line tool to parse the file
and
>>> output json.
>>>
>>> let's try to see where it breaks.
>>>
>>> David Lang
>>>
>>> On Wed, 29 Jun 2016, Alec Swan wrote:
>>>
>>> David, as you suggested, I extracted the log lines containing Hindi
>>>>
>>>> characters in a separate file and ran "file -bi" which returned
>>>> "text/plain; charset=utf-8". Which confirms that logs are written in
>>>> UTF-8.
>>>> Any thoughts what would cause rsyslog to send messages like
>>>> "\u00E0\u00.4???
>>>>
>>>> Description in Hindi" causing Elasticsearch to throw an exception?
>>>>
>>>> Thanks,
>>>>
>>>> Alec
>>>>
>>>> On Wed, Jun 29, 2016 at 4:08 PM, alecswan <[email protected]>
wrote:
>>>>
>>>> I looked at the code that produces this log file and it's writing
the
>>>> log
>>>>>
>>>>> with utf-8 encoding. What else could cause this problem? Could it
be
>>>>> that
>>>>> Hindi characters may require 3 bytes for encoding? Just grasping at
>>>>> straws
>>>>> here ...
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alec
>>>>>
>>>>>
>>>>> -------- Original message --------
>>>>> From: David Lang
>>>>> Date:29/06/2016 2:00 PM (GMT-07:00)
>>>>> To: rsyslog-users
>>>>> Subject: Re: [rsyslog] Invalid JSON from
>>>>> mmnormalize/liblognorm/omelasticsearch
>>>>>
>>>>> On Wed, 29 Jun 2016, Alec Swan wrote:
>>>>>
>>>>> > I tried using mmutf8fix as shown below, but it didn't seem to fix
the
>>>>> > problem. What I am doing is monitoring a log file with imfile
action,
>>>>> > parsing it with mmnormalize and sending JSON to Elasticsearch
with
>>>>> > omelasticsearch.
>>>>> >
>>>>> > I check the encoding of the log file using "file -bi" and it says
>>>>> > "text/plain; charset=us-ascii".
>>>>>
>>>>> > However, it contains some Hindi characters, which I assume are
>>>>> > encoded
>>>>> with
>>>>> > us-ascii.
>>>>>
>>>>> There is no way to encode Hindi characters as us-ascii. us-ascii is
the
>>>>> most
>>>>> basic character set, English uppper case, lower case and
punctuation
>>>>> only.
>>>>>
>>>>> So whatever character set it is in, it's not us-ascii
>>>>>
>>>>> > If I understand correctly,
>>>>> > us-ascii is a subset of UTF-8. If this is the case, do I really
need
>>>>> > to
>>>>> us
>>>>> > mmutf8fix?
>>>>>
>>>>> It all depends on what character set it's actually in. try making a
>>>>> copy
>>>>> of the
>>>>> file that has the Hindi characters near the beginning of it and try
the
>>>>> file -bi
>>>>> again, see if it gives a more accurate answer.
>>>>>
>>>>> otherwise, you will have to track down what's writing the messages
and
>>>>> try
>>>>> to
>>>>> set the character set there (or at least find out what character
set
>>>>> it's
>>>>> using)
>>>>>
>>>>> David Lang
>>>>>
>>>>> > To me it seems like the Hindi characters are UTF-8 encoded with
>>>>> > 3-byte
>>>>> > sequences and when they are received by Elasticsearch the byte
>>>>> > sequence
>>>>> is
>>>>> > incorrectly decoded to invalid Unicode sequence, such as
"\u00.4".
Is
>>>>> this
>>>>> > plausible?
>>>>> >
>>>>> > module(load = "imfile")
>>>>> > module(load="mmutf8fix")
>>>>> > module(load = "mmnormalize")
>>>>> > module(load = "omelasticsearch")
>>>>> >
>>>>> > input(type = "imfile" Ruleset="X" ...)
>>>>> > ruleset(name = "X") {
>>>>> >  action(type="mmutf8fix")
>>>>> >  action(type = "mmnormalize" ...)
>>>>> >  action(type = "omelasticsearch" ...)
>>>>> > }
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Alec
>>>>> >
>>>>> > On Tue, Jun 28, 2016 at 4:49 PM, Alec Swan <[email protected]>
>>>>> > wrote:
>>>>> >
>>>>> >> Thanks for the suggestion, Dave.  I noticed that on the client
side
>>>>> the
>>>>> >> log contained Hindi characters that got translated to
>>>>> "\u00E0\u00.4???\"
>>>>> >> which eventually caused the error. I'll give mmutf8fix plugin a
try.
>>>>> >>
>>>>> >> Thanks,
>>>>> >>
>>>>> >> Alec
>>>>> >>
>>>>> >> On Tue, Jun 28, 2016 at 3:24 PM, Dave Caplinger <
>>>>> >> [email protected]> wrote:
>>>>> >>
>>>>> >>> On Jun 28, 2016, at 4:04 PM, Alec Swan <[email protected]>
wrote:
>>>>> >>> >
>>>>> >>> > I think the root cause of the problem is that there is an
invalid
>>>>> UTF-8
>>>>> >>> > sequence "\u00.4" in the value if the "message" field. In
fact, I
>>>>> just
>>>>> >>> > confirmed that {"message":"\u00.4"} is not a valid JSON on
>>>>> >>> > http://jsonlint.com/.
>>>>> >>>
>>>>> >>> I've run into something similar where the original message
source
>>>>> >>> was
>>>>> >>> sending Windows-1252 or other character set.  Rsyslog doesn't
know
>>>>> the
>>>>> >>> incoming character set, so it doesn't know that it needs to be
>>>>> converted to
>>>>> >>> UTF-8. (That particular input would receive logs from various
>>>>> sources,
>>>>> so
>>>>> >>> the character set could vary per message).
>>>>> >>>
>>>>> >>> The fix we used was to add action(type="mmutf8fix") to the
affected
>>>>> >>> ruleset prior to any JSON template use.  This isn't strictly
>>>>> >>> accurate
>>>>> >>> because you lose the 'invalid' character in the resulting
string,
>>>>> >>> but
>>>>> at
>>>>> >>> least that string is JSON-safe.  In the ideal case you'd know
what
>>>>> the
>>>>> >>> original character set was and explicitly convert it UTF-8, but
>>>>> >>> that
>>>>> wasn't
>>>>> >>> practical in our use case.
>>>>> >>>
>>>>> >>> --
>>>>> >>> Dave Caplinger | Director, Technical Product Management
>>>>> >>> Solutionary — An NTT Group Security Company
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> rsyslog mailing list
>>>>> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> >>> http://www.rsyslog.com/professional-services/
>>>>> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED
by a
>>>>> myriad
>>>>> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if
>>>>> you
>>>>> >>> DON'T LIKE THAT.
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> > _______________________________________________
>>>>> > rsyslog mailing list
>>>>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> > http://www.rsyslog.com/professional-services/
>>>>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
>>>>> DON'T LIKE THAT.
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
>>>> DON'T LIKE THAT.
>>>>
>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
>>> DON'T LIKE THAT.
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
>> LIKE THAT.
>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to