vranes commented on code in PR #55535:
URL: https://github.com/apache/spark/pull/55535#discussion_r3161729880


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala:
##########
@@ -3897,3 +3899,178 @@ case class TimestampDiff(
     copy(startTimestamp = newLeft, endTimestamp = newRight)
   }
 }
+
+/**
+ * Aligns a timestamp to the start of a fixed-size interval bucket.
+ *
+ * Returns the start of the half-open bucket [start, start + bucketSize) 
containing ts.
+ * All computation is performed on UTC values.
+ */
+case class TimeBucket(
+    bucketSize: Expression,
+    ts: Expression,
+    originTs: Expression)
+  extends TernaryExpression with ExpectsInputTypes {

Review Comment:
   Synced offline.
   To summarize the agreed-upon approach for LTZ TIMESTAMP:
   - YM interval bucketing is done in LTZ
   - DT interval bucketing is done in alignment with TimestampAddInterval: LTZ 
calendar addition on days part, then add remaining micros
   - Default origin is 01-01-1970 00:00:00 in LTZ
   
   What was done:
   Made TimeBucket extend TimeZoneAwareExpression and threaded zoneId through 
both helpers. 
   The semantics are now: 
   - YM bucketing matches + INTERVAL '<k*bucketMonths>' MONTH in the session 
zone (so monthly/yearly buckets land at local month-start across DST)
   - DT bucketing matches + INTERVAL '<k*bucketSize>' via timestampAddDayTime 
for buckets ≥ 1 day (sub-day stays pure UTC since fixed-length). 
    
   For TIMESTAMP_NTZ, helpers receive ZoneOffset.UTC. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to