first I've heard of this, we'll need to have Rainer comment on this.
David Lang
On Wed, 6 Jul 2016, Alec Swan wrote:
Date: Wed, 6 Jul 2016 15:34:44 -0600
From: Alec Swan <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] Invalid JSON from
mmnormalize/liblognorm/omelasticsearch
Dave, I tried using liblognorm to parse the log message and it looks like
%rest% liblognorm type can only match up to 10240 characters. So, for
example the following rule succeeds parsing 10239 character message, but
fails with 10240.
rule=:%message:rest%
The particular log file I am parsing contains enormous log messages, e.g.
180,000 characters in a single line. So, is there really a limit on 10240
characters in liblognorm? If so, what's the recommended way to handle
parsing of extremely large messages?
Thanks,
Alec
On Wed, Jun 29, 2016 at 6:09 PM, David Lang <[email protected]> wrote:
This is helping narrow things down.
I would have rsyslog write to a file with the template that you use to
send to elasticsearch.
I would also use the liblognorm command-line tool to parse the file and
output json.
let's try to see where it breaks.
David Lang
On Wed, 29 Jun 2016, Alec Swan wrote:
David, as you suggested, I extracted the log lines containing Hindi
characters in a separate file and ran "file -bi" which returned
"text/plain; charset=utf-8". Which confirms that logs are written in
UTF-8.
Any thoughts what would cause rsyslog to send messages like
"\u00E0\u00.4???
Description in Hindi" causing Elasticsearch to throw an exception?
Thanks,
Alec
On Wed, Jun 29, 2016 at 4:08 PM, alecswan <[email protected]> wrote:
I looked at the code that produces this log file and it's writing the log
with utf-8 encoding. What else could cause this problem? Could it be that
Hindi characters may require 3 bytes for encoding? Just grasping at
straws
here ...
Thanks,
Alec
-------- Original message --------
From: David Lang
Date:29/06/2016 2:00 PM (GMT-07:00)
To: rsyslog-users
Subject: Re: [rsyslog] Invalid JSON from
mmnormalize/liblognorm/omelasticsearch
On Wed, 29 Jun 2016, Alec Swan wrote:
> I tried using mmutf8fix as shown below, but it didn't seem to fix the
> problem. What I am doing is monitoring a log file with imfile action,
> parsing it with mmnormalize and sending JSON to Elasticsearch with
> omelasticsearch.
>
> I check the encoding of the log file using "file -bi" and it says
> "text/plain; charset=us-ascii".
> However, it contains some Hindi characters, which I assume are encoded
with
> us-ascii.
There is no way to encode Hindi characters as us-ascii. us-ascii is the
most
basic character set, English uppper case, lower case and punctuation
only.
So whatever character set it is in, it's not us-ascii
> If I understand correctly,
> us-ascii is a subset of UTF-8. If this is the case, do I really need to
us
> mmutf8fix?
It all depends on what character set it's actually in. try making a copy
of the
file that has the Hindi characters near the beginning of it and try the
file -bi
again, see if it gives a more accurate answer.
otherwise, you will have to track down what's writing the messages and
try
to
set the character set there (or at least find out what character set it's
using)
David Lang
> To me it seems like the Hindi characters are UTF-8 encoded with 3-byte
> sequences and when they are received by Elasticsearch the byte sequence
is
> incorrectly decoded to invalid Unicode sequence, such as "\u00.4". Is
this
> plausible?
>
> module(load = "imfile")
> module(load="mmutf8fix")
> module(load = "mmnormalize")
> module(load = "omelasticsearch")
>
> input(type = "imfile" Ruleset="X" ...)
> ruleset(name = "X") {
> action(type="mmutf8fix")
> action(type = "mmnormalize" ...)
> action(type = "omelasticsearch" ...)
> }
>
> Thanks,
>
> Alec
>
> On Tue, Jun 28, 2016 at 4:49 PM, Alec Swan <[email protected]> wrote:
>
>> Thanks for the suggestion, Dave. I noticed that on the client side
the
>> log contained Hindi characters that got translated to
"\u00E0\u00.4???\"
>> which eventually caused the error. I'll give mmutf8fix plugin a try.
>>
>> Thanks,
>>
>> Alec
>>
>> On Tue, Jun 28, 2016 at 3:24 PM, Dave Caplinger <
>> [email protected]> wrote:
>>
>>> On Jun 28, 2016, at 4:04 PM, Alec Swan <[email protected]> wrote:
>>> >
>>> > I think the root cause of the problem is that there is an invalid
UTF-8
>>> > sequence "\u00.4" in the value if the "message" field. In fact, I
just
>>> > confirmed that {"message":"\u00.4"} is not a valid JSON on
>>> > http://jsonlint.com/.
>>>
>>> I've run into something similar where the original message source was
>>> sending Windows-1252 or other character set. Rsyslog doesn't know
the
>>> incoming character set, so it doesn't know that it needs to be
converted to
>>> UTF-8. (That particular input would receive logs from various
sources,
so
>>> the character set could vary per message).
>>>
>>> The fix we used was to add action(type="mmutf8fix") to the affected
>>> ruleset prior to any JSON template use. This isn't strictly accurate
>>> because you lose the 'invalid' character in the resulting string, but
at
>>> least that string is JSON-safe. In the ideal case you'd know what
the
>>> original character set was and explicitly convert it UTF-8, but that
wasn't
>>> practical in our use case.
>>>
>>> --
>>> Dave Caplinger | Director, Technical Product Management
>>> Solutionary — An NTT Group Security Company
>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
>>> DON'T LIKE THAT.
>>>
>>
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.