subject:"Reading file with Unicode characters"

Reading file with Unicode characters

2015-04-08 Thread Arun Lists

Hi,

Does SparkContext's textFile() method handle files with Unicode characters?
How about files in UTF-8 format?

Going further, is it possible to specify encodings to the method? If not,
what should one do if the files to be read are in some encoding?

Thanks,
arun

RE: Reading file with Unicode characters

2015-04-08 Thread java8964

Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost 
only supporting Linux, so UTF-8 is the only encoding supported, as it is the 
the one on Linux.
If you have other encoding data, you may want to vote for this 
Jira:https://issues.apache.org/jira/browse/MAPREDUCE-232
Yong

Date: Wed, 8 Apr 2015 10:35:18 -0700
Subject: Reading file with Unicode characters
From: lists.a...@gmail.com
To: user@spark.apache.org
CC: lists.a...@gmail.com

Hi,
Does SparkContext's textFile() method handle files with Unicode characters? How 
about files in UTF-8 format?
Going further, is it possible to specify encodings to the method? If not, what 
should one do if the files to be read are in some encoding?
Thanks,arun

Re: Reading file with Unicode characters

2015-04-08 Thread Arun Lists

Thanks!

arun

On Wed, Apr 8, 2015 at 10:51 AM, java8964 java8...@hotmail.com wrote:

 Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is
 almost only supporting Linux, so UTF-8 is the only encoding supported, as
 it is the the one on Linux.

 If you have other encoding data, you may want to vote for this Jira:
 https://issues.apache.org/jira/browse/MAPREDUCE-232

 Yong

 --
 Date: Wed, 8 Apr 2015 10:35:18 -0700
 Subject: Reading file with Unicode characters
 From: lists.a...@gmail.com
 To: user@spark.apache.org
 CC: lists.a...@gmail.com


 Hi,

 Does SparkContext's textFile() method handle files with Unicode
 characters? How about files in UTF-8 format?

 Going further, is it possible to specify encodings to the method? If not,
 what should one do if the files to be read are in some encoding?

 Thanks,
 arun

Reading file with Unicode characters

RE: Reading file with Unicode characters

Re: Reading file with Unicode characters

3 matches

Site Navigation

Mail list logo

Footer information