[ 
https://issues.apache.org/jira/browse/SPARK-18076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18076.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

Issue resolved by pull request 15610
[https://github.com/apache/spark/pull/15610]

> Fix default Locale used in DateFormat, NumberFormat to Locale.US
> ----------------------------------------------------------------
>
>                 Key: SPARK-18076
>                 URL: https://issues.apache.org/jira/browse/SPARK-18076
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, Spark Core, SQL
>    Affects Versions: 2.0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>              Labels: releasenotes
>             Fix For: 2.1.0
>
>
> Many parts of the code use {{DateFormat}} and {{NumberFormat}} instances. 
> Although the behavior of these format is mostly determined by things like 
> format strings, the exact behavior can vary according to the platform's 
> default locale. Although the locale defaults to "en", it can be set to 
> something else by env variables. And if it does, it can cause the same code 
> to succeed or fail based just on locale:
> {code}
> import java.text._
> import java.util._
> def parse(s: String, l: Locale) = new SimpleDateFormat("yyyyMMMdd", 
> l).parse(s)
> parse("1989Dec31", Locale.US)
> Sun Dec 31 00:00:00 GMT 1989
> parse("1989Dec31", Locale.UK)
> Sun Dec 31 00:00:00 GMT 1989
> parse("1989Dec31", Locale.CHINA)
> java.text.ParseException: Unparseable date: "1989Dec31"
>   at java.text.DateFormat.parse(DateFormat.java:366)
>   at .parse(<console>:18)
>   ... 32 elided
> parse("1989Dec31", Locale.GERMANY)
> java.text.ParseException: Unparseable date: "1989Dec31"
>   at java.text.DateFormat.parse(DateFormat.java:366)
>   at .parse(<console>:18)
>   ... 32 elided
> {code}
> Where not otherwise specified, I believe all instances in the code should 
> default to some fixed value, and that should probably be {{Locale.US}}. This 
> matches the JVM's default, and specifies both language ("en") and region 
> ("US") to remove ambiguity. This most closely matches what the current code 
> behavior would be (unless default locale was changed), because it will 
> currently default to "en".
> This affects SQL date/time functions. At the moment, the only SQL function 
> that lets the user specify language/country is "sentences", which is 
> consistent with Hive.
> It affects dates passed in the JSON API. 
> It affects some strings rendered in the UI, potentially. Although this isn't 
> a correctness issue, there may be an argument for not letting that vary (?)
> It affects a bunch of instances where dates are formatted into strings for 
> things like IDs or file names, which is far less likely to cause a problem, 
> but worth making consistent.
> The other occurrences are in tests.
> The downside to this change is also its upside: the behavior doesn't depend 
> on default JVM locale, but, also can't be affected by the default JVM locale. 
> For example, if you wanted to parse some dates in a way that depended on an 
> non-US locale (not just the format string) then it would no longer be 
> possible. There's no means of specifying this, for example, in SQL functions 
> for parsing dates. However, controlling this by globally changing the locale 
> isn't exactly great either.
> The purpose of this change is to make the current default behavior 
> deterministic and fixed. PR coming.
> CC [~hyukjin.kwon]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to