Interesting issue. Since you can see the output as expected for raw format, it means the flume processes the event as byte stream in the right way except the avrojson encoding of the sink.
I took a look at the code and saw that flume uses avro 1.5.4 to encode the output as avrojson, and I found there are a few open avro bugs reported for 1.5.x unicode encoding: https://issues.apache.org/jira/browse/AVRO-851 https://issues.apache.org/jira/browse/AVRO-860 Since the patch hasn't been committed I'm not sure whether it's caused by the avro issue or not. Do you have to use the avrojson formatter? How about raw? -mingjie On 11/05/2011 06:41 AM, Yoshiki Kajihara wrote: > Hi, > > We have trouble in multi-bytes letters transfer. > > When we send the plain text, containing multi-bytes code such as > "日本語"(meaninig The Japanese Language), > to the sink, we cannot see the multi-bytes letters, sent there, as expected, > in the configuration where the output formatter is "avrojson". > > In the configuration where the output formatter is "raw", we can as expected. > > Why the formatter is set to "raw", we can see those letters, we expect? > > We are running version "Flume 0.9.4-cdh3u2". > > Tell us how to slove those problem. > > Thanks. > > ---- > Yoshiki Kajihara > >