Reading file with Unicode characters

2015-04-08 Thread Arun Lists
Hi,

Does SparkContext's textFile() method handle files with Unicode characters?
How about files in UTF-8 format?

Going further, is it possible to specify encodings to the method? If not,
what should one do if the files to be read are in some encoding?

Thanks,
arun


RE: Reading file with Unicode characters

2015-04-08 Thread java8964
Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost 
only supporting Linux, so UTF-8 is the only encoding supported, as it is the 
the one on Linux.
If you have other encoding data, you may want to vote for this 
Jira:https://issues.apache.org/jira/browse/MAPREDUCE-232
Yong

Date: Wed, 8 Apr 2015 10:35:18 -0700
Subject: Reading file with Unicode characters
From: lists.a...@gmail.com
To: user@spark.apache.org
CC: lists.a...@gmail.com

Hi,
Does SparkContext's textFile() method handle files with Unicode characters? How 
about files in UTF-8 format?
Going further, is it possible to specify encodings to the method? If not, what 
should one do if the files to be read are in some encoding?
Thanks,arun
  

Re: Reading file with Unicode characters

2015-04-08 Thread Arun Lists
Thanks!

arun

On Wed, Apr 8, 2015 at 10:51 AM, java8964 java8...@hotmail.com wrote:

 Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is
 almost only supporting Linux, so UTF-8 is the only encoding supported, as
 it is the the one on Linux.

 If you have other encoding data, you may want to vote for this Jira:
 https://issues.apache.org/jira/browse/MAPREDUCE-232

 Yong

 --
 Date: Wed, 8 Apr 2015 10:35:18 -0700
 Subject: Reading file with Unicode characters
 From: lists.a...@gmail.com
 To: user@spark.apache.org
 CC: lists.a...@gmail.com


 Hi,

 Does SparkContext's textFile() method handle files with Unicode
 characters? How about files in UTF-8 format?

 Going further, is it possible to specify encodings to the method? If not,
 what should one do if the files to be read are in some encoding?

 Thanks,
 arun