[ https://issues.apache.org/jira/browse/SPARK-30546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan updated SPARK-30546: -------------------------------- Description: We've decided to not follow the SQL standard to define the interval type in 3.0. We should try our best to hide intervals from data sources/external catalogs as much as possible, to not leak internals to external systems. In Spark 2.4, intervals are exposed in the following places: 1. The `CalendarIntervalType` is public 2. `Colum.cast` accepts `CalendarIntervalType` and can cast string to interval. 3. `DataFrame.collect` can return `CalendarInterval` objects. 4. UDF can tale `CalendarInterval` as input. 5. data sources can return IntervalRow directly which may contain `CalendarInterval`. In Spark 3.0, we don't want to break Spark 2.4 applications, but we should not expose intervals wider than 2.4. In general, we should avoid leaking intervals to DS v2 and catalog plugins. We should also revert some PostgresSQL specific interval features. was: Before 3.0 we may make some efforts for the current interval type to make it more future-proofing. e.g. 1. add unstable annotation to the CalendarInterval class. People already use it as UDF inputs so it’s better to make it clear it’s unstable. 2. Add a schema checker to prohibit create v2 custom catalog table with intervals, as same as what we do for the builtin catalog 3. Add a schema checker for DataFrameWriterV2 too 4. Make the interval type incomparable as version 2.4 for disambiguation of comparison between year-month and day-time fields 5. The 3.0 newly added to_csv should not support output intervals as same as using CSV file format or make it fully support as normal strings 6. The function to_json should not allow using interval as a key field as same as the value field and JSON datasource, with a legacy config to restore or make it fully support as normal strings 7. Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI, so there is no round trip. > Make interval type more future-proof > ------------------------------------ > > Key: SPARK-30546 > URL: https://issues.apache.org/jira/browse/SPARK-30546 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.0.0 > Reporter: Kent Yao > Priority: Major > > We've decided to not follow the SQL standard to define the interval type in > 3.0. We should try our best to hide intervals from data sources/external > catalogs as much as possible, to not leak internals to external systems. > In Spark 2.4, intervals are exposed in the following places: > 1. The `CalendarIntervalType` is public > 2. `Colum.cast` accepts `CalendarIntervalType` and can cast string to > interval. > 3. `DataFrame.collect` can return `CalendarInterval` objects. > 4. UDF can tale `CalendarInterval` as input. > 5. data sources can return IntervalRow directly which may contain > `CalendarInterval`. > In Spark 3.0, we don't want to break Spark 2.4 applications, but we should > not expose intervals wider than 2.4. In general, we should avoid leaking > intervals to DS v2 and catalog plugins. We should also revert some > PostgresSQL specific interval features. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org