[ https://issues.apache.org/jira/browse/SPARK-31879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-31879: ----------------------------------- Assignee: (was: Kent Yao) > First day of week changed for non-MONDAY_START Lacales > ------------------------------------------------------ > > Key: SPARK-31879 > URL: https://issues.apache.org/jira/browse/SPARK-31879 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.0.0, 3.1.0 > Reporter: Kent Yao > Priority: Blocker > > h1. cases > {code:sql} > spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u'); > 2019-12-29 00:00:00 > spark-sql> set spark.sql.legacy.timeParserPolicy=legacy; > spark.sql.legacy.timeParserPolicy legacy > spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u'); > 2019-12-30 00:00:00 > {code} > h1. reasons > These week-based fields need Locale to express their semantics, the first day > of the week varies from country to country. > From the Java doc of WeekFields > {code:java} > /** > * Gets the first day-of-week. > * <p> > * The first day-of-week varies by culture. > * For example, the US uses Sunday, while France and the ISO-8601 > standard use Monday. > * This method returns the first day using the standard {@code DayOfWeek} > enum. > * > * @return the first day-of-week, not null > */ > public DayOfWeek getFirstDayOfWeek() { > return firstDayOfWeek; > } > {code} > But for the SimpleDateFormat, the day-of-week is not localized > ``` > u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1 > ``` > Currently, the default locale we use is the US, so the result moved a day > backward. > For other countries, please refer to [First Day of the Week in Different > Countries|http://chartsbin.com/view/41671] > h1. solution options > 1. Use new Locale("en", "GB") as default locale. > 2. For JDK10 and onwards, we can set locale Unicode extension 'fw' to 'mon', > but not work for lower JDKs > 3. Forbid 'u', give user proper exceptions, and enable and document 'e/c'. > Currently, the 'u' is internally substituted by 'e', but they are not > equivalent. > 1 and 2 can solve this with default locale but not for the functions with > custom locale supported. > cc [~cloud_fan] [~dongjoon] [~maropu] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org