Reading file with Unicode characters
Hi, Does SparkContext's textFile() method handle files with Unicode characters? How about files in UTF-8 format? Going further, is it possible to specify encodings to the method? If not, what should one do if the files to be read are in some encoding? Thanks, arun
RE: Reading file with Unicode characters
Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost only supporting Linux, so UTF-8 is the only encoding supported, as it is the the one on Linux. If you have other encoding data, you may want to vote for this Jira:https://issues.apache.org/jira/browse/MAPREDUCE-232 Yong Date: Wed, 8 Apr 2015 10:35:18 -0700 Subject: Reading file with Unicode characters From: lists.a...@gmail.com To: user@spark.apache.org CC: lists.a...@gmail.com Hi, Does SparkContext's textFile() method handle files with Unicode characters? How about files in UTF-8 format? Going further, is it possible to specify encodings to the method? If not, what should one do if the files to be read are in some encoding? Thanks,arun
Re: Reading file with Unicode characters
Thanks! arun On Wed, Apr 8, 2015 at 10:51 AM, java8964 java8...@hotmail.com wrote: Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost only supporting Linux, so UTF-8 is the only encoding supported, as it is the the one on Linux. If you have other encoding data, you may want to vote for this Jira: https://issues.apache.org/jira/browse/MAPREDUCE-232 Yong -- Date: Wed, 8 Apr 2015 10:35:18 -0700 Subject: Reading file with Unicode characters From: lists.a...@gmail.com To: user@spark.apache.org CC: lists.a...@gmail.com Hi, Does SparkContext's textFile() method handle files with Unicode characters? How about files in UTF-8 format? Going further, is it possible to specify encodings to the method? If not, what should one do if the files to be read are in some encoding? Thanks, arun