[
https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-42635.
---------------------------------
> Several counter-intuitive behaviours in the TimestampAdd expression
> -------------------------------------------------------------------
>
> Key: SPARK-42635
> URL: https://issues.apache.org/jira/browse/SPARK-42635
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Reporter: Chenhao Li
> Assignee: Chenhao Li
> Priority: Major
> Fix For: 3.3.3, 3.4.1
>
>
> # When the time is close to daylight saving time transition, the result may
> be discontinuous and not monotonic.
> We currently have:
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> spark.sql("select timestampadd(second, 24 * 3600 - 1,
> timestamp'2011-03-12 03:00:00')").show
> +------------------------------------------------------------------------+
> |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')|
> +------------------------------------------------------------------------+
> | 2011-03-13 03:59:59|
> +------------------------------------------------------------------------+
> scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12
> 03:00:00')").show
> +------------------------------------------------------------------+
> |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')|
> +------------------------------------------------------------------+
> | 2011-03-13 03:00:00|
> +------------------------------------------------------------------+ {code}
>
> In the second query, adding one more second will set the time back one hour
> instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12
> 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due
> to the daylight saving time transition.
> The root cause of the problem is the Spark code at
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790]
> wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day
> and time-in-day split before looking at the timezone.
> 2. Adding month, quarter, and year silently ignores Int overflow during unit
> conversion.
> The root cause is
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246].
> {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking
> overflow. Note that we do have overflow checking in adding the amount to the
> timestamp, so the behavior is inconsistent.
> This can cause counter-intuitive results like this:
> {code:scala}
> scala> spark.sql("select timestampadd(quarter, 1431655764,
> timestamp'1970-01-01')").show
> +------------------------------------------------------------------+
> |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')|
> +------------------------------------------------------------------+
> | 1969-09-01 00:00:00|
> +------------------------------------------------------------------+{code}
> 3. Adding sub-month units (week, day, hour, minute, second, millisecond,
> microsecond)silently ignores Long overflow during unit conversion.
> This is similar to the previous problem:
> {code:scala}
> scala> spark.sql("select timestampadd(day, 106751992,
> timestamp'1970-01-01')").show(false)
> +-------------------------------------------------------------+
> |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')|
> +-------------------------------------------------------------+
> |-290308-12-22 15:58:10.448384 |
> +-------------------------------------------------------------+{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]