[ 
https://issues.apache.org/jira/browse/SPARK-56769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rito Takeuchi updated SPARK-56769:
----------------------------------
    Description: 
h2. Background

`DateTimeUtils.truncTimestamp` for the WEEK / MONTH / QUARTER / YEAR levels 
currently routes through:

{code:scala}
case _ => // Try to truncate date levels
  val dDays = microsToDays(micros, zoneId)
  daysToMicros(truncDate(dDays, level), zoneId)
{code}

`microsToDays` allocates `Instant` + `ZonedDateTime` + `LocalDate` per row; 
`daysToMicros` allocates `LocalDate` + `ZonedDateTime` + `Instant`. `truncDate` 
itself allocates one more `LocalDate` for MONTH/YEAR (in `getDayOfMonth` / 
`getDayInYear`) and *two* for QUARTER (the existing implementation goes through 
`IsoFields.DAY_OF_QUARTER`, which is a `TemporalAdjuster` that produces a fresh 
`LocalDate`). The result is 167-218 ns/row on JDK 17 GH Actions runners.

SPARK-56663 introduced the offset-arithmetic + DST-equality-guard pattern for 
the time-level units (MINUTE / HOUR / DAY) and confirmed the pattern is sound 
for any unit that evenly divides {{MICROS_PER_DAY}}. The date-level branch is a 
natural extension.

h2. Proposal

The framework for offset-based truncation -- resolve offset once, apply, 
truncate in the local frame, re-apply, DST guard, fall back on DST-cross or 
arithmetic overflow -- is identical for every level above SECOND. Only the 
"truncate in local frame" step varies. This PR inlines SPARK-56663's 
`truncToUnitFast` together with the new date-level path directly into 
`truncTimestamp`, and keeps a single private `truncTimestampSlow` as a complete 
reference implementation that the fast path falls back to:

{code:scala}
def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
  // MICROSECOND / MILLISECOND / SECOND short-circuits (no zone work).
  // Offset arithmetic for every other level.
  // DST guard, fallback to truncTimestampSlow.
}

private def truncTimestampSlow(micros: Long, level: Int, zoneId: ZoneId): Long
{code}

The local-frame truncation step is the only thing the fast path branches on:

* MICROSECOND / MILLISECOND / SECOND -- pure UTC `floorMod` (zone offsets have 
at most second precision per `java.time.ZoneOffset`; no zone information 
needed).
* MINUTE / HOUR / DAY -- shifted-local `floorMod` against the unit micros.
* WEEK / MONTH / QUARTER / YEAR -- compute local epoch-day by integer division, 
run [[truncDate]] in the local-day frame, multiply back to local micros.

Everything else (offset resolve via `rules.getOffset`, `addExact` / 
`subtractExact`, DST guard via offset-equality at the candidate, slow-path 
fallback) is shared.

Also rewrite `TRUNC_TO_QUARTER` from `IsoFields.DAY_OF_QUARTER` (a 
`TemporalAdjuster` that produces a fresh `LocalDate`) to a direct 
`withMonth(firstMonthOfQuarter).withDayOfMonth(1)` chain on the existing 
`LocalDate`. Saves one allocation + the adjuster overhead per call.

`truncTimestampSlow` covers every level explicitly so it serves as a 
self-contained reference implementation -- the fast path's correctness can be 
verified against it case-by-case.

h2. Benchmark

`DateTimeBenchmark` Truncation, wholestage on, ns/row, on a 12th Gen Intel 
i7-1260P (master = pre-SPARK-56663):

|| level || master baseline || this PR || speedup ||
| WEEK    | 165.2 |  78.2 | 2.11x |
| MONTH   | 181.9 |  92.2 | 1.97x |
| MM      | 182.2 |  92.5 | 1.97x |
| MON     | 182.9 |  92.7 | 1.97x |
| QUARTER | 216.8 | 108.8 | 1.99x |
| YEAR    | 205.2 |  96.7 | 2.12x |
| YYYY    | 205.8 |  96.9 | 2.12x |
| YY      | 206.3 |  96.0 | 2.15x |

Time-level units (MINUTE / HOUR / DAY / SECOND) and `trunc(date, ...)` are 
unchanged within noise; this PR's hot path for those levels is byte-identical 
to SPARK-56663 after the unification.

h2. Out of scope

* `trunc(date, ...)` (date input, no zoneId) -- this PR only changes the 
`timestamp -> date_trunc` flow. The `TruncDate` expression bypasses 
`truncTimestamp`; the only change visible to it is the `TRUNC_TO_QUARTER` 
rewrite (which `trunc(date, ...)` doesn't use in the benchmark today).
* MICROSECOND / MILLISECOND / SECOND / MINUTE / HOUR / DAY -- handled by 
SPARK-56663. The unification in this PR inlines the existing fast path into 
`truncTimestamp` but does not change its semantics or measured perf.
* Per-instance offset cache -- a separate optimization that would amortize the 
{{rules.getOffset}} call across rows. Would benefit both this PR's and 
SPARK-56663's paths. Out of scope here.
* Integer-only calendar arithmetic (Hinnant-style) -- would eliminate the 
remaining `LocalDate` allocation inside `truncDate` for MONTH/YEAR and push 
date-level units to the same floor as time-level units. Out of scope here.

h2. Related

* SPARK-56663 - introduced the offset-arithmetic fast path for MIN/HR/DAY; this 
PR extends the same pattern to the date-level units and inlines both paths into 
a single implementation.
* SPARK-33404 - introduced the slow path that this family of changes is 
recovering from.
* SPARK-30766 / SPARK-30857 - the DST-correctness invariants from these fixes 
are preserved here via the offset-equality guard.


  was:
h2. Background

`DateTimeUtils.truncTimestamp` for the WEEK / MONTH / QUARTER / YEAR levels 
currently routes through:

{code:scala}
case _ => // Try to truncate date levels
  val dDays = microsToDays(micros, zoneId)
  daysToMicros(truncDate(dDays, level), zoneId)
{code}

`microsToDays` allocates `Instant` + `ZonedDateTime` + `LocalDate` per row; 
`daysToMicros` allocates `LocalDate` + `ZonedDateTime` + `Instant`. `truncDate` 
itself allocates one more `LocalDate` for MONTH/YEAR (in `getDayOfMonth` / 
`getDayInYear`) and *two* for QUARTER (the existing implementation goes through 
`IsoFields.DAY_OF_QUARTER`, which is a `TemporalAdjuster` that produces a fresh 
`LocalDate`). The result is 167-218 ns/row on JDK 17 GH Actions runners.

SPARK-56663 introduced the offset-arithmetic + DST-equality-guard pattern for 
the time-level units (MINUTE / HOUR / DAY) and confirmed the pattern is sound 
for any unit that evenly divides {{MICROS_PER_DAY}}. The date-level branch is a 
natural extension.

h2. Proposal

The framework for offset-based truncation -- resolve offset once, apply, 
truncate in the local frame, re-apply, DST guard, fall back on 
DST-cross/overflow -- is identical for every level above SECOND. Only the 
"truncate in local frame" step varies. Fold the existing `truncToUnitFast` 
(SPARK-56663) and the new date-level path into a single helper:

{code:scala}
private def truncTimestampFast(micros: Long, zoneId: ZoneId, level: Int): Long
private def truncTimestampSlow(micros: Long, zoneId: ZoneId, level: Int): Long
{code}

The local-frame truncation step is the only thing the fast path branches on:

* MINUTE / HOUR / DAY -- {{local - Math.floorMod(local, unitMicros)}}; pure 
arithmetic, the existing path.
* WEEK / MONTH / QUARTER / YEAR -- compute local epoch-day by integer division, 
run [[truncDate]] in the local-day frame, multiply back to local micros.

Everything else (offset resolve, `addExact` / `subtractExact`, DST guard, 
slow-path fallback) is shared.

Also rewrite `TRUNC_TO_QUARTER` from `IsoFields.DAY_OF_QUARTER` (a 
`TemporalAdjuster` that produces a fresh `LocalDate`) to a direct 
`withMonth(firstMonthOfQuarter).withDayOfMonth(1)` chain on the existing 
`LocalDate`. Saves one allocation + the adjuster overhead per call.

h2. Benchmark

`DateTimeBenchmark` Truncation, wholestage on, ns/row, on a 12th Gen Intel 
i7-1260P (master = pre-SPARK-56663):

|| level || master baseline || this PR || speedup ||
| WEEK    | 165.2 |  78.2 | 2.11x |
| MONTH   | 181.9 |  92.2 | 1.97x |
| MM      | 182.2 |  92.5 | 1.97x |
| MON     | 182.9 |  92.7 | 1.97x |
| QUARTER | 216.8 | 108.8 | 1.99x |
| YEAR    | 205.2 |  96.7 | 2.12x |
| YYYY    | 205.8 |  96.9 | 2.12x |
| YY      | 206.3 |  96.0 | 2.15x |

Time-level units (MINUTE / HOUR / DAY / SECOND) and `trunc(date, ...)` are 
unchanged within noise; this PR's hot path for those levels is byte-identical 
to SPARK-56663 after the unification.

h2. Out of scope

* `trunc(date, ...)` (date input, no zoneId) -- this PR only changes the 
`timestamp -> date_trunc` flow. The `TruncDate` expression bypasses 
`truncTimestamp`; the only change visible to it is the `TRUNC_TO_QUARTER` 
rewrite (which `trunc(date, ...)` doesn't use in the benchmark today).
* MICROSECOND / MILLISECOND / SECOND / MINUTE / HOUR / DAY -- handled by 
SPARK-56663. The unification in this PR refactors the existing fast path but 
does not change its semantics or measured perf.
* Per-instance offset cache -- a separate optimization that would amortize the 
{{rules.getOffset}} call across rows. Would benefit both this PR's and 
SPARK-56663's paths. Out of scope here.
* Integer-only calendar arithmetic (Hinnant-style) -- would eliminate the 
remaining `LocalDate` allocation inside `truncDate` for MONTH/YEAR and push 
date-level units to the same floor as time-level units. Out of scope here.

h2. Related

* SPARK-56663 - introduced the offset-arithmetic fast path for MIN/HR/DAY; this 
PR extends the same pattern to the date-level units and folds both paths into a 
single helper.
* SPARK-33404 - introduced the slow path that this family of changes is 
recovering from.
* SPARK-30766 / SPARK-30857 - the DST-correctness invariants from these fixes 
are preserved here via the offset-equality guard.



> Add fast path for date_trunc WEEK/MONTH/QUARTER/YEAR
> ----------------------------------------------------
>
>                 Key: SPARK-56769
>                 URL: https://issues.apache.org/jira/browse/SPARK-56769
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Rito Takeuchi
>            Priority: Major
>
> h2. Background
> `DateTimeUtils.truncTimestamp` for the WEEK / MONTH / QUARTER / YEAR levels 
> currently routes through:
> {code:scala}
> case _ => // Try to truncate date levels
>   val dDays = microsToDays(micros, zoneId)
>   daysToMicros(truncDate(dDays, level), zoneId)
> {code}
> `microsToDays` allocates `Instant` + `ZonedDateTime` + `LocalDate` per row; 
> `daysToMicros` allocates `LocalDate` + `ZonedDateTime` + `Instant`. 
> `truncDate` itself allocates one more `LocalDate` for MONTH/YEAR (in 
> `getDayOfMonth` / `getDayInYear`) and *two* for QUARTER (the existing 
> implementation goes through `IsoFields.DAY_OF_QUARTER`, which is a 
> `TemporalAdjuster` that produces a fresh `LocalDate`). The result is 167-218 
> ns/row on JDK 17 GH Actions runners.
> SPARK-56663 introduced the offset-arithmetic + DST-equality-guard pattern for 
> the time-level units (MINUTE / HOUR / DAY) and confirmed the pattern is sound 
> for any unit that evenly divides {{MICROS_PER_DAY}}. The date-level branch is 
> a natural extension.
> h2. Proposal
> The framework for offset-based truncation -- resolve offset once, apply, 
> truncate in the local frame, re-apply, DST guard, fall back on DST-cross or 
> arithmetic overflow -- is identical for every level above SECOND. Only the 
> "truncate in local frame" step varies. This PR inlines SPARK-56663's 
> `truncToUnitFast` together with the new date-level path directly into 
> `truncTimestamp`, and keeps a single private `truncTimestampSlow` as a 
> complete reference implementation that the fast path falls back to:
> {code:scala}
> def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
>   // MICROSECOND / MILLISECOND / SECOND short-circuits (no zone work).
>   // Offset arithmetic for every other level.
>   // DST guard, fallback to truncTimestampSlow.
> }
> private def truncTimestampSlow(micros: Long, level: Int, zoneId: ZoneId): Long
> {code}
> The local-frame truncation step is the only thing the fast path branches on:
> * MICROSECOND / MILLISECOND / SECOND -- pure UTC `floorMod` (zone offsets 
> have at most second precision per `java.time.ZoneOffset`; no zone information 
> needed).
> * MINUTE / HOUR / DAY -- shifted-local `floorMod` against the unit micros.
> * WEEK / MONTH / QUARTER / YEAR -- compute local epoch-day by integer 
> division, run [[truncDate]] in the local-day frame, multiply back to local 
> micros.
> Everything else (offset resolve via `rules.getOffset`, `addExact` / 
> `subtractExact`, DST guard via offset-equality at the candidate, slow-path 
> fallback) is shared.
> Also rewrite `TRUNC_TO_QUARTER` from `IsoFields.DAY_OF_QUARTER` (a 
> `TemporalAdjuster` that produces a fresh `LocalDate`) to a direct 
> `withMonth(firstMonthOfQuarter).withDayOfMonth(1)` chain on the existing 
> `LocalDate`. Saves one allocation + the adjuster overhead per call.
> `truncTimestampSlow` covers every level explicitly so it serves as a 
> self-contained reference implementation -- the fast path's correctness can be 
> verified against it case-by-case.
> h2. Benchmark
> `DateTimeBenchmark` Truncation, wholestage on, ns/row, on a 12th Gen Intel 
> i7-1260P (master = pre-SPARK-56663):
> || level || master baseline || this PR || speedup ||
> | WEEK    | 165.2 |  78.2 | 2.11x |
> | MONTH   | 181.9 |  92.2 | 1.97x |
> | MM      | 182.2 |  92.5 | 1.97x |
> | MON     | 182.9 |  92.7 | 1.97x |
> | QUARTER | 216.8 | 108.8 | 1.99x |
> | YEAR    | 205.2 |  96.7 | 2.12x |
> | YYYY    | 205.8 |  96.9 | 2.12x |
> | YY      | 206.3 |  96.0 | 2.15x |
> Time-level units (MINUTE / HOUR / DAY / SECOND) and `trunc(date, ...)` are 
> unchanged within noise; this PR's hot path for those levels is byte-identical 
> to SPARK-56663 after the unification.
> h2. Out of scope
> * `trunc(date, ...)` (date input, no zoneId) -- this PR only changes the 
> `timestamp -> date_trunc` flow. The `TruncDate` expression bypasses 
> `truncTimestamp`; the only change visible to it is the `TRUNC_TO_QUARTER` 
> rewrite (which `trunc(date, ...)` doesn't use in the benchmark today).
> * MICROSECOND / MILLISECOND / SECOND / MINUTE / HOUR / DAY -- handled by 
> SPARK-56663. The unification in this PR inlines the existing fast path into 
> `truncTimestamp` but does not change its semantics or measured perf.
> * Per-instance offset cache -- a separate optimization that would amortize 
> the {{rules.getOffset}} call across rows. Would benefit both this PR's and 
> SPARK-56663's paths. Out of scope here.
> * Integer-only calendar arithmetic (Hinnant-style) -- would eliminate the 
> remaining `LocalDate` allocation inside `truncDate` for MONTH/YEAR and push 
> date-level units to the same floor as time-level units. Out of scope here.
> h2. Related
> * SPARK-56663 - introduced the offset-arithmetic fast path for MIN/HR/DAY; 
> this PR extends the same pattern to the date-level units and inlines both 
> paths into a single implementation.
> * SPARK-33404 - introduced the slow path that this family of changes is 
> recovering from.
> * SPARK-30766 / SPARK-30857 - the DST-correctness invariants from these fixes 
> are preserved here via the offset-equality guard.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to