[ 
https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315938#comment-15315938
 ] 

Brett Randall commented on SPARK-15723:
---------------------------------------

Thanks for merging.  And thanks for the Scala repl test - I can confirm that 
this is driven by a combination of *both* default TimeZone and default Locale - 
the default Locale impacts the interpretation of the short TZ code, which makes 
sense.

{{Australia/Sydney/en_AU}} -> {color:red}*false*{color}
{noformat}
scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=AU <<EOF
val time = (new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time == 1424470877190L
EOF

scala> val time = (new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424413277190

scala> time == 1424470877190L
res0: Boolean = false
{noformat}

{{Australia/Sydney/en_US}} -> {color:red}*false*{color}
{noformat}
scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=US <<EOF
val time = (new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time == 1424470877190L
EOF

scala> val time = (new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424413277190

scala> time == 1424470877190L
res0: Boolean = false
{noformat}

{{America/New_York/en_US}} -> {color:green}*true*{color}
{noformat}
scala -J-Duser.timezone="America/New_York" -J-Duser.country=US <<EOF
val time = (new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time == 1424470877190L
EOF

scala> val time = (new 
java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime
time: Long = 1424470877190

scala> time == 1424470877190L
res0: Boolean = true
{noformat}

So you were correct - this _can_ be disambiguated by applying a bias to the SDF 
in the code, but this would be necessarily a fixed bias, and it has to be done 
with a {{Calendar}} not a {{TimeZone}}:

{code}
sdf.setCalendar(Calendar.getInstance(TimeZone.getTimeZone("America/New_York"), 
new Locale("en_US")))
{code}

I'm not certain this is better or more correct though, but it would remove any 
ambiguity in the short TZ codes - could be documented - all short TZ codes are 
evaluated as if they were in this default TZ/Locale.  That might upset someone 
deploying that wants {{MST}} = Malaysia Standard Time and not Mountain Time.  
Make a note here if you think it is worth pursuing further, but I suspect we 
just have to honour the local env defaults and discourage abbreviated TZs.  And 
the test fix is merged now, so all-good, thanks.

> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ 
> name
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-15723
>                 URL: https://issues.apache.org/jira/browse/SPARK-15723
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Brett Randall
>            Assignee: Brett Randall
>            Priority: Minor
>              Labels: test
>             Fix For: 1.6.2, 2.0.0
>
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
>     new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be 
> (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the 
> local default timezone causes {{EST}} to be interpreted as something other 
> than US Eastern Standard Time.  If your local timezone is 
> {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} 
> when interpreting short zone names.  According to the {{TimeZone}} javadoc, 
> they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to