[heka] Logstreamer Input problem in Heka0.9.2 about monitoring log files containing non-utf-8 charset content

Yuan Xiaoke Tue, 12 Jan 2016 23:59:15 -0800

Hi you all,

I'm using LogstreamerInput to collect  some application logs, and then
send them to elasticsearch.
The encoding of the log files is Latin1, but I know there's Chinese
charset of GBK or GB2312 encode in the files.
So the problem is when i output the logs, there're many unreadable
codes on logoutput screen or Kibana.
I notice that there's no config entry specifying charset encode with
which to read the log content.


1. Is utf-8 the default encode in LogstreamerInput? I' can find it in
the source code.
2.  Is there a way to solve this problem?

--------The LogstreamerInput config toml--------
[server1]
type = "LogstreamerInput"
splitter = "rule_splitter"
log_directory = "/home/wolfias/heka/rulelog"
file_match = 
'nohup\.log\.(?P<Year>\d+)-(?P<Month>\d+)-(?P<Day>\d+)_(?P<Hour>\d+)'

priority = ["Year","Month","Day","Hour"]
  [server1.translation.year]
  missing = 9999

[rule_splitter]
type = "RegexSplitter"
delimiter = '########'
delimiter_eol = true

[PayloadEncoder]

[rule_KafkaOutput]
type = "KafkaOutput"
message_matcher = "TRUE"
topic = "rulelog"
addrs = ["10.63.204.70:9092"]
encoder = "PayloadEncoder"

[LogOutput]
message_matcher = "TRUE"
encoder = "PayloadEncoder"

--------nohup log files opened using GB2312--------
2015-12-28 15:46:37 信息: ########
2015-12-28 15:46:37 信息: 开始时间戳:1451288797063
2015-12-28 15:46:37 信息: 服务端Servet调用提示->BomRuleServet->进入servet程序！

--------nohup log files opened using utf-8--------
2015-12-28 15:46:37 ÐÅÏ¢: ########
2015-12-28 15:46:37 ÐÅÏ¢: ¿ªÊ¼Ê±¼ä´Á:1451288797063
2015-12-28 15:46:37 ÐÅÏ¢: ·þÎñ¶ËServetµ÷ÓÃÌáÊ¾->BomRuleServet->½øÈëservet³ÌÐò£¡
_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka

[heka] Logstreamer Input problem in Heka0.9.2 about monitoring log files containing non-utf-8 charset content

Reply via email to