vranes commented on code in PR #55535:
URL: https://github.com/apache/spark/pull/55535#discussion_r3161729880
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala:
##########
@@ -3897,3 +3899,178 @@ case class TimestampDiff(
copy(startTimestamp = newLeft, endTimestamp = newRight)
}
}
+
+/**
+ * Aligns a timestamp to the start of a fixed-size interval bucket.
+ *
+ * Returns the start of the half-open bucket [start, start + bucketSize)
containing ts.
+ * All computation is performed on UTC values.
+ */
+case class TimeBucket(
+ bucketSize: Expression,
+ ts: Expression,
+ originTs: Expression)
+ extends TernaryExpression with ExpectsInputTypes {
Review Comment:
Synced offline.
To summarize the agreed-upon approach for LTZ TIMESTAMP:
- YM interval bucketing is done in LTZ
- DT interval bucketing is done in alignment with TimestampAddInterval: LTZ
calendar addition on days part, then add remaining micros
- Default origin is 01-01-1970 00:00:00 in LTZ
What was done:
Made TimeBucket extend TimeZoneAwareExpression and threaded zoneId through
both helpers.
The semantics are now:
- YM bucketing matches + INTERVAL '<k*bucketMonths>' MONTH in the session
zone (so monthly/yearly buckets land at local month-start across DST)
- DT bucketing matches + INTERVAL '<k*bucketSize>' via timestampAddDayTime
for buckets ≥ 1 day (sub-day stays pure UTC since fixed-length).
For TIMESTAMP_NTZ, helpers receive ZoneOffset.UTC.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]