Is the original file indeed utf-8? Especially Windows environments tend to mess 
up the files (E.g. Java on Windows does not use by default UTF-8). However, 
also the software that processed the data before could have modified it.

> Am 10.11.2018 um 02:17 schrieb lsn24 <lekshmi.a...@hotmail.com>:
> 
> Hello,
> 
> Per the documentation default character encoding of spark is UTF-8. But
> when i try to read non ascii characters, spark tend to read it as question
> marks. What am I doing wrong ?. Below is my Syntax:
> 
> val ds = spark.read.textFile("a .bz2 file from hdfs");
> ds.show();
> 
> The string "KøBENHAVN"  gets displayed as "K�BENHAVN"
> 
> I did the testing on spark shell, ran it the same command as a part of spark
> Job. Both yields the same result.
> 
> I don't know what I am missing . I read the documentation, I couldn't find
> any explicit config etc.
> 
> Any pointers will be greatly appreciated!
> 
> Thanks
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to