[jira] [Commented] (SPARK-26971) How to read delimiter (Cedilla) in spark RDD and Dataframes

Hyukjin Kwon (JIRA) Sun, 24 Feb 2019 01:46:59 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776194#comment-16776194
 ]


Hyukjin Kwon commented on SPARK-26971:
--------------------------------------

Questions should go to mailing list rather than filing it as an issue here.

> How to read delimiter (Cedilla) in spark RDD and Dataframes
> -----------------------------------------------------------
>
>                 Key: SPARK-26971
>                 URL: https://issues.apache.org/jira/browse/SPARK-26971
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 1.6.0
>            Reporter: Babu
>            Priority: Minor
>
>  
> I am trying to read a cedilla delimited HDFS Text file. I am getting the 
> below error, did any one face similar issue?
> {{hadoop fs -cat test_file.dat }}
> {{1ÇCelvelandÇOhio 2ÇDurhamÇNC 3ÇDallasÇTexas }}
> {{>>> rdd = sc.textFile("test_file.dat") }}
> {{>>> rdd.collect() [u'1\xc7Celveland\xc7Ohio', u'2\xc7Durham\xc7NC', 
> u'3Dallas\xc7Texas'] }}
> {{>>> rdd.map(lambda p: p.split("\xc7")).collect() UnicodeDecodeError: 
> 'ascii' codec can't decode byte 0xc7 in position 0: ordinal not in range(128) 
> }}
> {{>>> 
> sqlContext.read.format("text").option("delimiter","Ç").option("encoding","ISO-8859").load("/user/cloudera/test_file.dat").show()
>  }}
> |1ÇCelvelandÇOhio|
> {{2ÇDurhamÇNC}}
> {{ 3DallasÇTexas}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26971) How to read delimiter (Cedilla) in spark RDD and Dataframes

Reply via email to