[jira] [Updated] (SPARK-30546) Make interval type more future-proof

Wenchen Fan (Jira) Thu, 23 Jan 2020 03:37:26 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-30546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan updated SPARK-30546:
--------------------------------
    Description: 
We've decided to not follow the SQL standard to define the interval type in 
3.0. We should try our best to hide intervals from data sources/external 
catalogs as much as possible, to not leak internals to external systems.

In Spark 2.4, intervals are exposed in the following places:
1. The `CalendarIntervalType` is public
2. `Colum.cast` accepts `CalendarIntervalType` and can cast string to interval.
3. `DataFrame.collect` can return `CalendarInterval` objects.
4. UDF can tale `CalendarInterval` as input.
5. data sources can return IntervalRow directly which may contain 
`CalendarInterval`.

In Spark 3.0, we don't want to break Spark 2.4 applications, but we should not 
expose intervals wider than 2.4. In general, we should avoid leaking intervals 
to DS v2 and catalog plugins. We should also revert some PostgresSQL specific 
interval features.

  was:
Before 3.0 we may make some efforts for the current interval type to make it
more future-proofing. e.g.
1. add unstable annotation to the CalendarInterval class. People already use
it as UDF inputs so it’s better to make it clear it’s unstable.
2. Add a schema checker to prohibit create v2 custom catalog table with
intervals, as same as what we do for the builtin catalog
3. Add a schema checker for DataFrameWriterV2 too
4. Make the interval type incomparable as version 2.4 for disambiguation of
comparison between year-month and day-time fields
5. The 3.0 newly added to_csv should not support output intervals as same as
using CSV file format or make it fully support as normal strings
6. The function to_json should not allow using interval as a key field as
same as the value field and JSON datasource, with a legacy config to
restore or make it fully support as normal strings
7. Revert interval ISO/ANSI SQL Standard output since we decide not to
follow ANSI, so there is no round trip.


> Make interval type more future-proof
> ------------------------------------
>
>                 Key: SPARK-30546
>                 URL: https://issues.apache.org/jira/browse/SPARK-30546
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Kent Yao
>            Priority: Major
>
> We've decided to not follow the SQL standard to define the interval type in 
> 3.0. We should try our best to hide intervals from data sources/external 
> catalogs as much as possible, to not leak internals to external systems.
> In Spark 2.4, intervals are exposed in the following places:
> 1. The `CalendarIntervalType` is public
> 2. `Colum.cast` accepts `CalendarIntervalType` and can cast string to 
> interval.
> 3. `DataFrame.collect` can return `CalendarInterval` objects.
> 4. UDF can tale `CalendarInterval` as input.
> 5. data sources can return IntervalRow directly which may contain 
> `CalendarInterval`.
> In Spark 3.0, we don't want to break Spark 2.4 applications, but we should 
> not expose intervals wider than 2.4. In general, we should avoid leaking 
> intervals to DS v2 and catalog plugins. We should also revert some 
> PostgresSQL specific interval features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30546) Make interval type more future-proof

Reply via email to