[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

Sean Owen (JIRA) Tue, 16 May 2017 07:11:36 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012455#comment-16012455
 ]


Sean Owen commented on SPARK-18359:
-----------------------------------

Using the JVM locale is a bad way to get this behavior, because it's not 
portable. Input would mysteriously work on one machine and not another, or 
succeed but quietly give the wrong output. It also caused some SQL-related 
methods to return the wrong value on non-US-locale machines. That's a big(ger) 
problem that had to be fixed.

Yes, the problem is there isn't a way to specify non-US locales just for the 
CSV parsing. That's what this is about, and yes you should work on it if you 
need the functionality.

As a workaround you can do some preprocessing to parse the dates manually. Not 
great, but not hard either.

> Let user specify locale in CSV parsing
> --------------------------------------
>
>                 Key: SPARK-18359
>                 URL: https://issues.apache.org/jira/browse/SPARK-18359
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

Reply via email to