[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-10-11 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200831#comment-16200831
 ] 

Andrew Ash commented on SPARK-18359:


I agree with Sean -- using the submitting JVM's locale is insufficient.  Some 
users will want to join CSVs with different locales, and that doesn't offer a 
means of using different locales on different files.  This really should be 
specified on a per-csv basis, which is why I think the direction [~abicz] 
suggested supports this nicely:

{noformat}
spark.read.option("locale", "value").csv("filefromeurope.csv")
{noformat}

where the value is something we can use to get a {{java.util.Locale}}

Has anyone worked around this functionality gap in a useful way, or do we still 
need to work through these Locale issues in spark-csv?

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-05-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012455#comment-16012455
 ] 

Sean Owen commented on SPARK-18359:
---

Using the JVM locale is a bad way to get this behavior, because it's not 
portable. Input would mysteriously work on one machine and not another, or 
succeed but quietly give the wrong output. It also caused some SQL-related 
methods to return the wrong value on non-US-locale machines. That's a big(ger) 
problem that had to be fixed.

Yes, the problem is there isn't a way to specify non-US locales just for the 
CSV parsing. That's what this is about, and yes you should work on it if you 
need the functionality.

As a workaround you can do some preprocessing to parse the dates manually. Not 
great, but not hard either.

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-05-16 Thread Alexander Enns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012377#comment-16012377
 ] 

Alexander Enns commented on SPARK-18359:


This is exactly why there is a possibility to set the locale when a JVM 
instance is created. When we submit a Job to our cluster we have to tell Spark 
that it requires DE locale as our data are coming in the corresponding format. 
After the migration from 1.6 to 2.1 we had to give up usage of any type apart 
of string for parsing of CSV data (sounds not like it's a correct way to me). I 
can not see why exactly the changes required for SPARK-18076 have been made. 
But at the moment there is no possibility to parse CSV data formatted using 
locale different from US and this appears to me like a quite big restriction. 

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-05-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010709#comment-16010709
 ] 

Sean Owen commented on SPARK-18359:
---

This behavior was not correct, or at least more problematic, because behavior 
varied even within one cluster according to the particular of the environment. 
The machine env shouldn't affect correctness this way.

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-05-15 Thread Alexander Enns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010704#comment-16010704
 ] 

Alexander Enns commented on SPARK-18359:


IMHO the changes done for SPARK-18076 need to be reverted. 
The former behaviour and usage of JVM locale was correct.



> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-03-07 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900608#comment-15900608
 ] 

Takeshi Yamamuro commented on SPARK-18359:
--

Since JDK9 use CLDR as locale by default, it seems good to explicitly handle 
it: http://openjdk.java.net/jeps/252

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2017-02-23 Thread Arkadiusz Bicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880544#comment-15880544
 ] 

Arkadiusz Bicz commented on SPARK-18359:


Can interface to specify locale looks like?: 

spark.read.option("numberLocale", "German").csv("filefromeurope.csv")

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2016-12-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764307#comment-15764307
 ] 

Sean Owen commented on SPARK-18359:
---

They're similar, but this is about locale vs time zone.

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18359) Let user specify locale in CSV parsing

2016-12-20 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764298#comment-15764298
 ] 

Hyukjin Kwon commented on SPARK-18359:
--

Could this be resolved as a duplicate of SPARK-18937?

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org