[
https://issues.apache.org/jira/browse/SPARK-56594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56594:
-----------------------------------
Labels: pull-request-available (was: )
> Add time_bucket scalar function for interval-based timestamp bucketing
> ----------------------------------------------------------------------
>
> Key: SPARK-56594
> URL: https://issues.apache.org/jira/browse/SPARK-56594
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Nikolina Vraneš
> Priority: Major
> Labels: pull-request-available
>
> h2. Summary
> Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that
> returns the start of the fixed-size time bucket containing {{{}ts{}}},
> aligned to a configurable origin point.
> h2. Syntax
> {code:sql}
> time_bucket(bucketSize, ts [, origin])
> {code}
> h2. Arguments
> * {{{}bucketSize{}}}: an interval expression (day-time or year-month)
> defining the bucket size. Must be positive and foldable.
> * {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
> * {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01
> 00:00:00 UTC{}}}. Must be the same type as {{ts}} and must be foldable.
> h2. Return type
> Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
> h2. Behavior
> * Divides the time axis into consecutive, non-overlapping intervals of the
> specified size, anchored at {{{}origin{}}}.
> * Returns the start of the half-open bucket {{[start, start + bucketSize)}}
> containing {{{}ts{}}}.
> * Day-time intervals produce fixed-size buckets (microsecond floor division).
> * Year-month intervals produce calendar-aligned buckets (end-of-month
> capping with step-back semantics).
> * All computation is performed on UTC values; session timezone does not
> affect bucketing. Users can pass a shifted `origin` to align buckets to a
> custom offset (e.g. local midnight), or cast to TIMESTAMP_NTZ for local
> wall-clock bucketing in a DST zone.
> * Any NULL argument returns NULL.
> h2. Examples
> {code:sql}
> SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
> -- 2024-01-01 11:15:00
>
>
>
> SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');
>
>
> -- 2024-01-01 11:00:00
>
>
>
> SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');
>
>
> -- 2024-03-01 00:00:00
>
>
>
> SELECT time_bucket(INTERVAL '1' HOUR,
>
>
> TIMESTAMP '2024-01-01 11:27:00',
> TIMESTAMP '1970-01-01 00:05:00');
>
>
> -- 2024-01-01 11:05:00
>
>
> {code}
> h2. Errors
> * {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not
> foldable.
> * {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
> * {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have
> mismatched types.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]