LuciferYang opened a new pull request, #55893:
URL: https://github.com/apache/spark/pull/55893
### What changes were proposed in this pull request?
Add a `ZoneOffset.UTC` fast path to `DateTimeUtils.daysToMicros(days,
zoneId)`:
```scala
if (zoneId eq ZoneOffset.UTC) {
Math.multiplyExact(days.toLong, MICROS_PER_DAY)
} else {
// existing LocalDate -> ZonedDateTime -> Instant path
}
```
For UTC the answer is simply `days * MICROS_PER_DAY`, so the slow path's
three heap allocations (`LocalDate`, `ZonedDateTime`, `Instant`) are wasted.
### Why are the changes needed?
`daysToMicros(days, ZoneOffset.UTC)` is on the per-row hot path of the
vectorized parquet reader (`DateToTimestampNTZUpdater` and the rebase
variants), the row-based parquet converter (`ParquetRowConverter`), the Avro
reader (`AvroDeserializer`), and DATE -> TIMESTAMP `Cast` (interpreted +
codegen). All of them pass the `ZoneOffset.UTC` singleton, so the
reference-equality fast path triggers everywhere it matters.
`ParquetVectorUpdaterBenchmark` results will be regenerated via GitHub
Actions.
### Does this PR introduce _any_ user-facing change?
No -- pure optimization. Behavior is preserved for every input in Spark's
valid `DateType` range. Both paths use `Math.multiplyExact` internally and
overflow at the same `|days| ~= 107M` boundary with the same
`ArithmeticException`, far outside any reachable input.
### How was this patch tested?
New `DateTimeUtilsSuite` contract test pins down the UTC fast path:
- Asserts it agrees with a fixed-offset zone (`Etc/GMT`) path for a
representative set of `days` (zero, positive, negative, `+/-maxSafeDays`).
- Asserts it equals `days * MICROS_PER_DAY` directly, so divergence in some
future JDK is caught.
- Asserts `ArithmeticException` on overflow.
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]