On Wed, 29 Jun 2016, Alec Swan wrote:

I tried using mmutf8fix as shown below, but it didn't seem to fix the
problem. What I am doing is monitoring a log file with imfile action,
parsing it with mmnormalize and sending JSON to Elasticsearch with
omelasticsearch.

I check the encoding of the log file using "file -bi" and it says
"text/plain; charset=us-ascii".

However, it contains some Hindi characters, which I assume are encoded with us-ascii.

There is no way to encode Hindi characters as us-ascii. us-ascii is the most basic character set, English uppper case, lower case and punctuation only.

So whatever character set it is in, it's not us-ascii

If I understand correctly,
us-ascii is a subset of UTF-8. If this is the case, do I really need to us
mmutf8fix?

It all depends on what character set it's actually in. try making a copy of the file that has the Hindi characters near the beginning of it and try the file -bi again, see if it gives a more accurate answer.

otherwise, you will have to track down what's writing the messages and try to set the character set there (or at least find out what character set it's using)

David Lang

To me it seems like the Hindi characters are UTF-8 encoded with 3-byte
sequences and when they are received by Elasticsearch the byte sequence is
incorrectly decoded to invalid Unicode sequence, such as "\u00.4". Is this
plausible?

module(load = "imfile")
module(load="mmutf8fix")
module(load = "mmnormalize")
module(load = "omelasticsearch")

input(type = "imfile" Ruleset="X" ...)
ruleset(name = "X") {
 action(type="mmutf8fix")
 action(type = "mmnormalize" ...)
 action(type = "omelasticsearch" ...)
}

Thanks,

Alec

On Tue, Jun 28, 2016 at 4:49 PM, Alec Swan <[email protected]> wrote:

Thanks for the suggestion, Dave.  I noticed that on the client side the
log contained Hindi characters that got translated to "\u00E0\u00.4???\"
which eventually caused the error. I'll give mmutf8fix plugin a try.

Thanks,

Alec

On Tue, Jun 28, 2016 at 3:24 PM, Dave Caplinger <
[email protected]> wrote:

On Jun 28, 2016, at 4:04 PM, Alec Swan <[email protected]> wrote:
>
> I think the root cause of the problem is that there is an invalid UTF-8
> sequence "\u00.4" in the value if the "message" field. In fact, I just
> confirmed that {"message":"\u00.4"} is not a valid JSON on
> http://jsonlint.com/.

I've run into something similar where the original message source was
sending Windows-1252 or other character set.  Rsyslog doesn't know the
incoming character set, so it doesn't know that it needs to be converted to
UTF-8. (That particular input would receive logs from various sources, so
the character set could vary per message).

The fix we used was to add action(type="mmutf8fix") to the affected
ruleset prior to any JSON template use.  This isn't strictly accurate
because you lose the 'invalid' character in the resulting string, but at
least that string is JSON-safe.  In the ideal case you'd know what the
original character set was and explicitly convert it UTF-8, but that wasn't
practical in our use case.

--
Dave Caplinger | Director, Technical Product Management
Solutionary — An NTT Group Security Company

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.



_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to