[ 
https://issues.apache.org/jira/browse/SPARK-56663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-56663:
----------------------------------
    Fix Version/s:     (was: 5.0.0)

> Restore fast path for date_trunc MINUTE/HOUR/DAY
> ------------------------------------------------
>
>                 Key: SPARK-56663
>                 URL: https://issues.apache.org/jira/browse/SPARK-56663
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Rito Takeuchi
>            Assignee: Rito Takeuchi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> h2. Background
> Before SPARK-33404 (Nov 2020), {{DateTimeUtils.truncTimestamp}} had a fast 
> path
> for the MINUTE level that did pure {{Math.floorMod(micros, 
> MICROS_PER_MINUTE)}}.
> SPARK-33404 fixed a correctness bug: the {{date_trunc('minute',
> '1769-10-17 17:10:02')}} call returned the original timestamp in
> {{America/Los_Angeles}} because LA used the LMT offset {{-07:52:58}} pre-1883
> and the fast path ignored the time-zone offset entirely. The fix routed
> MINUTE / HOUR / DAY through {{microsToInstant().atZone(zoneId).truncatedTo()}}
> instead. The author noted on the PR thread:
> bq. "Slow down is around 5.5 times" -- MaxGekk on PR #30338
> That regression has remained for ~5.5 years with no follow-up ticket.
> h2. Proposal
> Re-introduce a fast path for MINUTE / HOUR / DAY that resolves the zone offset
> once for the input instant, truncates by {{Math.floorMod}} in local time, and
> falls back to the slow path when the offset at the candidate truncated instant
> differs from the offset at the original instant (the SPARK-30766 / SPARK-30857
> DST guard). Sub-minute LMT zones (SPARK-33404), 30/45-minute zones
> (Asia/Kolkata +05:30, Asia/Kathmandu +05:45), and DST transitions are all
> handled correctly because the offset is applied as part of the floorMod
> arithmetic; no offset-alignment guard is needed.
> h2. Benchmark
> DateTimeBenchmark, wholestage on, {{ns/row}} on a 12th Gen Intel i7-1260P:
> || level || baseline || fast path || speedup ||
> | MINUTE  | 136.0 |  69.5 | 1.96x |
> | HOUR    | 136.2 |  69.8 | 1.95x |
> | DAY     | 136.5 |  69.8 | 1.96x |
> | DD      | 136.7 |  70.1 | 1.95x |
> This recovers most of the SPARK-33404 regression while keeping all of its
> correctness guarantees.
> h2. Out of scope
> Date-level units (WEEK / MONTH / QUARTER / YEAR and {{trunc(date, ...)}})
> still go through {{microsToDays}} / {{daysToMicros}}. Those can follow up in a
> separate ticket.
> h2. Related
> * SPARK-33404 - introduced the slowdown to fix the LMT minute bug (closed)
> * SPARK-30766 / SPARK-30857 - earlier fixes for HOUR / DAY (closed); their
>   DST-correctness invariants are preserved here via an offset-equality check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to