Perhaps your file is not utf-8.

I cannot reconstruct it.

### HADOOP:
~/Downloads ❯❯❯ hdfs -cat hdfs://xxxx/test.txt
17/04/06 13:43:58 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
1.0     862910025238798 100733314       18_百度输入法:100733314
8919173c6d49abfab02853458247e584        1:129:18_百度输入法:1.0%

### SPARK
scala> val t = sc.textFile("hdfs://xxxx/test.txt")
t: org.apache.spark.rdd.RDD[String] = hdfs://xxxx/test.txt
MapPartitionsRDD[3] at textFile at <console>:24

scala> t.first
res2: String = 1.0     862910025238798 100733314       18_百度输入法:100733314






On Thu, Apr 6, 2017 at 12:47 PM, Jone Zhang <joyoungzh...@gmail.com> wrote:

> var textFile = sc.textFile("xxx");
> textFile.first();
> res1: String = 1.0     100733314       18_?????:100733314
> 8919173c6d49abfab02853458247e584        1:129:18_?????:1.0
>
>
> hadoop fs -cat xxx
> 1.0    100733314       18_百度输入法:100733314 8919173c6d49abfab02853458247e584
>      1:129:18_百度输入法:1.0
>
> Why  chinese character gash appear when i use spark textFile?
> The code of hdfs file is utf-8.
>
>
> Thanks
>

Reply via email to