[
https://issues.apache.org/jira/browse/SPARK-56769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rito Takeuchi updated SPARK-56769:
----------------------------------
Description:
h2. Background
`DateTimeUtils.truncTimestamp` for the WEEK / MONTH / QUARTER / YEAR levels
currently routes through:
{code:scala}
case _ => // Try to truncate date levels
val dDays = microsToDays(micros, zoneId)
daysToMicros(truncDate(dDays, level), zoneId)
{code}
`microsToDays` allocates `Instant` + `ZonedDateTime` + `LocalDate` per row;
`daysToMicros` allocates `LocalDate` + `ZonedDateTime` + `Instant`. `truncDate`
itself allocates one more `LocalDate` for MONTH/YEAR (in `getDayOfMonth` /
`getDayInYear`) and *two* for QUARTER (the existing implementation goes through
`IsoFields.DAY_OF_QUARTER`, which is a `TemporalAdjuster` that produces a fresh
`LocalDate`). The result is 167-218 ns/row on JDK 17 GH Actions runners.
SPARK-56663 introduced the offset-arithmetic + DST-equality-guard pattern for
the time-level units (MINUTE / HOUR / DAY) and confirmed the pattern is sound
for any unit that evenly divides {{MICROS_PER_DAY}}. The date-level branch is a
natural extension.
h2. Proposal
The framework for offset-based truncation -- resolve offset once, apply,
truncate in the local frame, re-apply, DST guard, fall back on DST-cross or
arithmetic overflow -- is identical for every level above SECOND. Only the
"truncate in local frame" step varies. This PR inlines SPARK-56663's
`truncToUnitFast` together with the new date-level path directly into
`truncTimestamp`, and keeps a single private `truncTimestampSlow` as a complete
reference implementation that the fast path falls back to:
{code:scala}
def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
// MICROSECOND / MILLISECOND / SECOND short-circuits (no zone work).
// Offset arithmetic for every other level.
// DST guard, fallback to truncTimestampSlow.
}
private def truncTimestampSlow(micros: Long, level: Int, zoneId: ZoneId): Long
{code}
The local-frame truncation step is the only thing the fast path branches on:
* MICROSECOND / MILLISECOND / SECOND -- pure UTC `floorMod` (zone offsets have
at most second precision per `java.time.ZoneOffset`; no zone information
needed).
* MINUTE / HOUR / DAY -- shifted-local `floorMod` against the unit micros.
* WEEK / MONTH / QUARTER / YEAR -- compute local epoch-day by integer division,
run [[truncDate]] in the local-day frame, multiply back to local micros.
Everything else (offset resolve via `rules.getOffset`, `addExact` /
`subtractExact`, DST guard via offset-equality at the candidate, slow-path
fallback) is shared.
Also rewrite `TRUNC_TO_QUARTER` from `IsoFields.DAY_OF_QUARTER` (a
`TemporalAdjuster` that produces a fresh `LocalDate`) to a direct
`withMonth(firstMonthOfQuarter).withDayOfMonth(1)` chain on the existing
`LocalDate`. Saves one allocation + the adjuster overhead per call.
`truncTimestampSlow` covers every level explicitly so it serves as a
self-contained reference implementation -- the fast path's correctness can be
verified against it case-by-case.
h2. Benchmark
`DateTimeBenchmark` Truncation, wholestage on, ns/row, on a 12th Gen Intel
i7-1260P (master = pre-SPARK-56663):
|| level || master baseline || this PR || speedup ||
| WEEK | 165.2 | 78.2 | 2.11x |
| MONTH | 181.9 | 92.2 | 1.97x |
| MM | 182.2 | 92.5 | 1.97x |
| MON | 182.9 | 92.7 | 1.97x |
| QUARTER | 216.8 | 108.8 | 1.99x |
| YEAR | 205.2 | 96.7 | 2.12x |
| YYYY | 205.8 | 96.9 | 2.12x |
| YY | 206.3 | 96.0 | 2.15x |
Time-level units (MINUTE / HOUR / DAY / SECOND) and `trunc(date, ...)` are
unchanged within noise; this PR's hot path for those levels is byte-identical
to SPARK-56663 after the unification.
h2. Out of scope
* `trunc(date, ...)` (date input, no zoneId) -- this PR only changes the
`timestamp -> date_trunc` flow. The `TruncDate` expression bypasses
`truncTimestamp`; the only change visible to it is the `TRUNC_TO_QUARTER`
rewrite (which `trunc(date, ...)` doesn't use in the benchmark today).
* MICROSECOND / MILLISECOND / SECOND / MINUTE / HOUR / DAY -- handled by
SPARK-56663. The unification in this PR inlines the existing fast path into
`truncTimestamp` but does not change its semantics or measured perf.
* Per-instance offset cache -- a separate optimization that would amortize the
{{rules.getOffset}} call across rows. Would benefit both this PR's and
SPARK-56663's paths. Out of scope here.
* Integer-only calendar arithmetic (Hinnant-style) -- would eliminate the
remaining `LocalDate` allocation inside `truncDate` for MONTH/YEAR and push
date-level units to the same floor as time-level units. Out of scope here.
h2. Related
* SPARK-56663 - introduced the offset-arithmetic fast path for MIN/HR/DAY; this
PR extends the same pattern to the date-level units and inlines both paths into
a single implementation.
* SPARK-33404 - introduced the slow path that this family of changes is
recovering from.
* SPARK-30766 / SPARK-30857 - the DST-correctness invariants from these fixes
are preserved here via the offset-equality guard.
was:
h2. Background
`DateTimeUtils.truncTimestamp` for the WEEK / MONTH / QUARTER / YEAR levels
currently routes through:
{code:scala}
case _ => // Try to truncate date levels
val dDays = microsToDays(micros, zoneId)
daysToMicros(truncDate(dDays, level), zoneId)
{code}
`microsToDays` allocates `Instant` + `ZonedDateTime` + `LocalDate` per row;
`daysToMicros` allocates `LocalDate` + `ZonedDateTime` + `Instant`. `truncDate`
itself allocates one more `LocalDate` for MONTH/YEAR (in `getDayOfMonth` /
`getDayInYear`) and *two* for QUARTER (the existing implementation goes through
`IsoFields.DAY_OF_QUARTER`, which is a `TemporalAdjuster` that produces a fresh
`LocalDate`). The result is 167-218 ns/row on JDK 17 GH Actions runners.
SPARK-56663 introduced the offset-arithmetic + DST-equality-guard pattern for
the time-level units (MINUTE / HOUR / DAY) and confirmed the pattern is sound
for any unit that evenly divides {{MICROS_PER_DAY}}. The date-level branch is a
natural extension.
h2. Proposal
The framework for offset-based truncation -- resolve offset once, apply,
truncate in the local frame, re-apply, DST guard, fall back on
DST-cross/overflow -- is identical for every level above SECOND. Only the
"truncate in local frame" step varies. Fold the existing `truncToUnitFast`
(SPARK-56663) and the new date-level path into a single helper:
{code:scala}
private def truncTimestampFast(micros: Long, zoneId: ZoneId, level: Int): Long
private def truncTimestampSlow(micros: Long, zoneId: ZoneId, level: Int): Long
{code}
The local-frame truncation step is the only thing the fast path branches on:
* MINUTE / HOUR / DAY -- {{local - Math.floorMod(local, unitMicros)}}; pure
arithmetic, the existing path.
* WEEK / MONTH / QUARTER / YEAR -- compute local epoch-day by integer division,
run [[truncDate]] in the local-day frame, multiply back to local micros.
Everything else (offset resolve, `addExact` / `subtractExact`, DST guard,
slow-path fallback) is shared.
Also rewrite `TRUNC_TO_QUARTER` from `IsoFields.DAY_OF_QUARTER` (a
`TemporalAdjuster` that produces a fresh `LocalDate`) to a direct
`withMonth(firstMonthOfQuarter).withDayOfMonth(1)` chain on the existing
`LocalDate`. Saves one allocation + the adjuster overhead per call.
h2. Benchmark
`DateTimeBenchmark` Truncation, wholestage on, ns/row, on a 12th Gen Intel
i7-1260P (master = pre-SPARK-56663):
|| level || master baseline || this PR || speedup ||
| WEEK | 165.2 | 78.2 | 2.11x |
| MONTH | 181.9 | 92.2 | 1.97x |
| MM | 182.2 | 92.5 | 1.97x |
| MON | 182.9 | 92.7 | 1.97x |
| QUARTER | 216.8 | 108.8 | 1.99x |
| YEAR | 205.2 | 96.7 | 2.12x |
| YYYY | 205.8 | 96.9 | 2.12x |
| YY | 206.3 | 96.0 | 2.15x |
Time-level units (MINUTE / HOUR / DAY / SECOND) and `trunc(date, ...)` are
unchanged within noise; this PR's hot path for those levels is byte-identical
to SPARK-56663 after the unification.
h2. Out of scope
* `trunc(date, ...)` (date input, no zoneId) -- this PR only changes the
`timestamp -> date_trunc` flow. The `TruncDate` expression bypasses
`truncTimestamp`; the only change visible to it is the `TRUNC_TO_QUARTER`
rewrite (which `trunc(date, ...)` doesn't use in the benchmark today).
* MICROSECOND / MILLISECOND / SECOND / MINUTE / HOUR / DAY -- handled by
SPARK-56663. The unification in this PR refactors the existing fast path but
does not change its semantics or measured perf.
* Per-instance offset cache -- a separate optimization that would amortize the
{{rules.getOffset}} call across rows. Would benefit both this PR's and
SPARK-56663's paths. Out of scope here.
* Integer-only calendar arithmetic (Hinnant-style) -- would eliminate the
remaining `LocalDate` allocation inside `truncDate` for MONTH/YEAR and push
date-level units to the same floor as time-level units. Out of scope here.
h2. Related
* SPARK-56663 - introduced the offset-arithmetic fast path for MIN/HR/DAY; this
PR extends the same pattern to the date-level units and folds both paths into a
single helper.
* SPARK-33404 - introduced the slow path that this family of changes is
recovering from.
* SPARK-30766 / SPARK-30857 - the DST-correctness invariants from these fixes
are preserved here via the offset-equality guard.
> Add fast path for date_trunc WEEK/MONTH/QUARTER/YEAR
> ----------------------------------------------------
>
> Key: SPARK-56769
> URL: https://issues.apache.org/jira/browse/SPARK-56769
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Rito Takeuchi
> Priority: Major
>
> h2. Background
> `DateTimeUtils.truncTimestamp` for the WEEK / MONTH / QUARTER / YEAR levels
> currently routes through:
> {code:scala}
> case _ => // Try to truncate date levels
> val dDays = microsToDays(micros, zoneId)
> daysToMicros(truncDate(dDays, level), zoneId)
> {code}
> `microsToDays` allocates `Instant` + `ZonedDateTime` + `LocalDate` per row;
> `daysToMicros` allocates `LocalDate` + `ZonedDateTime` + `Instant`.
> `truncDate` itself allocates one more `LocalDate` for MONTH/YEAR (in
> `getDayOfMonth` / `getDayInYear`) and *two* for QUARTER (the existing
> implementation goes through `IsoFields.DAY_OF_QUARTER`, which is a
> `TemporalAdjuster` that produces a fresh `LocalDate`). The result is 167-218
> ns/row on JDK 17 GH Actions runners.
> SPARK-56663 introduced the offset-arithmetic + DST-equality-guard pattern for
> the time-level units (MINUTE / HOUR / DAY) and confirmed the pattern is sound
> for any unit that evenly divides {{MICROS_PER_DAY}}. The date-level branch is
> a natural extension.
> h2. Proposal
> The framework for offset-based truncation -- resolve offset once, apply,
> truncate in the local frame, re-apply, DST guard, fall back on DST-cross or
> arithmetic overflow -- is identical for every level above SECOND. Only the
> "truncate in local frame" step varies. This PR inlines SPARK-56663's
> `truncToUnitFast` together with the new date-level path directly into
> `truncTimestamp`, and keeps a single private `truncTimestampSlow` as a
> complete reference implementation that the fast path falls back to:
> {code:scala}
> def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
> // MICROSECOND / MILLISECOND / SECOND short-circuits (no zone work).
> // Offset arithmetic for every other level.
> // DST guard, fallback to truncTimestampSlow.
> }
> private def truncTimestampSlow(micros: Long, level: Int, zoneId: ZoneId): Long
> {code}
> The local-frame truncation step is the only thing the fast path branches on:
> * MICROSECOND / MILLISECOND / SECOND -- pure UTC `floorMod` (zone offsets
> have at most second precision per `java.time.ZoneOffset`; no zone information
> needed).
> * MINUTE / HOUR / DAY -- shifted-local `floorMod` against the unit micros.
> * WEEK / MONTH / QUARTER / YEAR -- compute local epoch-day by integer
> division, run [[truncDate]] in the local-day frame, multiply back to local
> micros.
> Everything else (offset resolve via `rules.getOffset`, `addExact` /
> `subtractExact`, DST guard via offset-equality at the candidate, slow-path
> fallback) is shared.
> Also rewrite `TRUNC_TO_QUARTER` from `IsoFields.DAY_OF_QUARTER` (a
> `TemporalAdjuster` that produces a fresh `LocalDate`) to a direct
> `withMonth(firstMonthOfQuarter).withDayOfMonth(1)` chain on the existing
> `LocalDate`. Saves one allocation + the adjuster overhead per call.
> `truncTimestampSlow` covers every level explicitly so it serves as a
> self-contained reference implementation -- the fast path's correctness can be
> verified against it case-by-case.
> h2. Benchmark
> `DateTimeBenchmark` Truncation, wholestage on, ns/row, on a 12th Gen Intel
> i7-1260P (master = pre-SPARK-56663):
> || level || master baseline || this PR || speedup ||
> | WEEK | 165.2 | 78.2 | 2.11x |
> | MONTH | 181.9 | 92.2 | 1.97x |
> | MM | 182.2 | 92.5 | 1.97x |
> | MON | 182.9 | 92.7 | 1.97x |
> | QUARTER | 216.8 | 108.8 | 1.99x |
> | YEAR | 205.2 | 96.7 | 2.12x |
> | YYYY | 205.8 | 96.9 | 2.12x |
> | YY | 206.3 | 96.0 | 2.15x |
> Time-level units (MINUTE / HOUR / DAY / SECOND) and `trunc(date, ...)` are
> unchanged within noise; this PR's hot path for those levels is byte-identical
> to SPARK-56663 after the unification.
> h2. Out of scope
> * `trunc(date, ...)` (date input, no zoneId) -- this PR only changes the
> `timestamp -> date_trunc` flow. The `TruncDate` expression bypasses
> `truncTimestamp`; the only change visible to it is the `TRUNC_TO_QUARTER`
> rewrite (which `trunc(date, ...)` doesn't use in the benchmark today).
> * MICROSECOND / MILLISECOND / SECOND / MINUTE / HOUR / DAY -- handled by
> SPARK-56663. The unification in this PR inlines the existing fast path into
> `truncTimestamp` but does not change its semantics or measured perf.
> * Per-instance offset cache -- a separate optimization that would amortize
> the {{rules.getOffset}} call across rows. Would benefit both this PR's and
> SPARK-56663's paths. Out of scope here.
> * Integer-only calendar arithmetic (Hinnant-style) -- would eliminate the
> remaining `LocalDate` allocation inside `truncDate` for MONTH/YEAR and push
> date-level units to the same floor as time-level units. Out of scope here.
> h2. Related
> * SPARK-56663 - introduced the offset-arithmetic fast path for MIN/HR/DAY;
> this PR extends the same pattern to the date-level units and inlines both
> paths into a single implementation.
> * SPARK-33404 - introduced the slow path that this family of changes is
> recovering from.
> * SPARK-30766 / SPARK-30857 - the DST-correctness invariants from these fixes
> are preserved here via the offset-equality guard.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]