[ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315938#comment-15315938 ]
Brett Randall commented on SPARK-15723: --------------------------------------- Thanks for merging. And thanks for the Scala repl test - I can confirm that this is driven by a combination of *both* default TimeZone and default Locale - the default Locale impacts the interpretation of the short TZ code, which makes sense. {{Australia/Sydney/en_AU}} -> {color:red}*false*{color} {noformat} scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=AU <<EOF val time = (new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time == 1424470877190L EOF scala> val time = (new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time: Long = 1424413277190 scala> time == 1424470877190L res0: Boolean = false {noformat} {{Australia/Sydney/en_US}} -> {color:red}*false*{color} {noformat} scala -J-Duser.timezone="Australia/Sydney" -J-Duser.country=US <<EOF val time = (new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time == 1424470877190L EOF scala> val time = (new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time: Long = 1424413277190 scala> time == 1424470877190L res0: Boolean = false {noformat} {{America/New_York/en_US}} -> {color:green}*true*{color} {noformat} scala -J-Duser.timezone="America/New_York" -J-Duser.country=US <<EOF val time = (new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time == 1424470877190L EOF scala> val time = (new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSz")).parse("2015-02-20T17:21:17.190EST").getTime time: Long = 1424470877190 scala> time == 1424470877190L res0: Boolean = true {noformat} So you were correct - this _can_ be disambiguated by applying a bias to the SDF in the code, but this would be necessarily a fixed bias, and it has to be done with a {{Calendar}} not a {{TimeZone}}: {code} sdf.setCalendar(Calendar.getInstance(TimeZone.getTimeZone("America/New_York"), new Locale("en_US"))) {code} I'm not certain this is better or more correct though, but it would remove any ambiguity in the short TZ codes - could be documented - all short TZ codes are evaluated as if they were in this default TZ/Locale. That might upset someone deploying that wants {{MST}} = Malaysia Standard Time and not Mountain Time. Make a note here if you think it is worth pursuing further, but I suspect we just have to honour the local env defaults and discourage abbreviated TZs. And the test fix is merged now, so all-good, thanks. > SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ > name > ---------------------------------------------------------------------------------- > > Key: SPARK-15723 > URL: https://issues.apache.org/jira/browse/SPARK-15723 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.1 > Reporter: Brett Randall > Assignee: Brett Randall > Priority: Minor > Labels: test > Fix For: 1.6.2, 2.0.0 > > > {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion: > {code} > new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be > (1424470877190L) > {code} > This test is fragile and fails when executing in an environment where the > local default timezone causes {{EST}} to be interpreted as something other > than US Eastern Standard Time. If your local timezone is > {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get: > {noformat} > date parsing *** FAILED *** > 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29) > {noformat} > In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} > when interpreting short zone names. According to the {{TimeZone}} javadoc, > they ought not be used: > {quote} > Three-letter time zone IDs > For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such > as "PST", "CTT", "AST") are also supported. However, their use is deprecated > because the same abbreviation is often used for multiple time zones (for > example, "CST" could be U.S. "Central Standard Time" and "China Standard > Time"), and the Java platform can then only recognize one of them. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org