[
https://issues.apache.org/jira/browse/SPARK-56594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nikolina Vraneš updated SPARK-56594:
------------------------------------
Description:
h2. Summary
Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that
returns the start of the fixed-size time bucket containing {{{}ts{}}}, aligned
to a configurable origin point.
h2. Syntax
{code:sql}
time_bucket(bucketSize, ts [, origin])
{code}
h2. Arguments
* {{{}bucketSize{}}}: an interval expression (day-time or year-month) defining
the bucket size. Must be positive and foldable.
* {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
* {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01
00:00:00{}}}. Must be the same type as {{ts}} and must be foldable.
h2. Return type
Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
h2. Behavior
* Divides the time axis into consecutive, non-overlapping intervals of the
specified size, anchored at {{{}origin{}}}.
* Returns the start of the half-open bucket {{[start, start + bucketSize)}}
containing {{{}ts{}}}.
* For `TIMESTAMP_NTZ`, bucketing is performed in UTC. For `TIMESTAMP`,
year-month interval buckets and calendar-day components of day-time interval
buckets align to the session time zone, matching `+ INTERVAL '<k*bucket_size>'`
semantics across DST transitions.
* Any NULL argument returns NULL.
h2. Examples
{code:sql}
SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
-- 2024-01-01 11:15:00
SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');
-- 2024-01-01 11:00:00
SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');
-- 2024-03-01 00:00:00
SELECT time_bucket(INTERVAL '1' HOUR,
TIMESTAMP '2024-01-01 11:27:00',
TIMESTAMP '1970-01-01 00:05:00');
-- 2024-01-01 11:05:00
{code}
h2. Errors
* {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not
foldable.
* {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
* {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have mismatched
types.
was:
h2. Summary
Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that
returns the start of the fixed-size time bucket containing {{{}ts{}}}, aligned
to a configurable origin point.
h2. Syntax
{code:sql}
time_bucket(bucketSize, ts [, origin])
{code}
h2. Arguments
* {{{}bucketSize{}}}: an interval expression (day-time or year-month) defining
the bucket size. Must be positive and foldable.
* {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
* {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01 00:00:00
UTC{}}}. Must be the same type as {{ts}} and must be foldable.
h2. Return type
Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
h2. Behavior
* Divides the time axis into consecutive, non-overlapping intervals of the
specified size, anchored at {{{}origin{}}}.
* Returns the start of the half-open bucket {{[start, start + bucketSize)}}
containing {{{}ts{}}}.
* For `TIMESTAMP_NTZ`, bucketing is performed in UTC. For `TIMESTAMP`,
year-month interval buckets and calendar-day components of day-time interval
buckets align to the session time zone, matching `+ INTERVAL '<k*bucket_size>'`
semantics across DST transitions.
* Any NULL argument returns NULL.
h2. Examples
{code:sql}
SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
-- 2024-01-01 11:15:00
SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');
-- 2024-01-01 11:00:00
SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');
-- 2024-03-01 00:00:00
SELECT time_bucket(INTERVAL '1' HOUR,
TIMESTAMP '2024-01-01 11:27:00',
TIMESTAMP '1970-01-01 00:05:00');
-- 2024-01-01 11:05:00
{code}
h2. Errors
* {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not
foldable.
* {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
* {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have mismatched
types.
> Add time_bucket scalar function for interval-based timestamp bucketing
> ----------------------------------------------------------------------
>
> Key: SPARK-56594
> URL: https://issues.apache.org/jira/browse/SPARK-56594
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Nikolina Vraneš
> Assignee: Nikolina Vraneš
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> h2. Summary
> Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that
> returns the start of the fixed-size time bucket containing {{{}ts{}}},
> aligned to a configurable origin point.
> h2. Syntax
> {code:sql}
> time_bucket(bucketSize, ts [, origin])
> {code}
> h2. Arguments
> * {{{}bucketSize{}}}: an interval expression (day-time or year-month)
> defining the bucket size. Must be positive and foldable.
> * {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
> * {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01
> 00:00:00{}}}. Must be the same type as {{ts}} and must be foldable.
> h2. Return type
> Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
> h2. Behavior
> * Divides the time axis into consecutive, non-overlapping intervals of the
> specified size, anchored at {{{}origin{}}}.
> * Returns the start of the half-open bucket {{[start, start + bucketSize)}}
> containing {{{}ts{}}}.
> * For `TIMESTAMP_NTZ`, bucketing is performed in UTC. For `TIMESTAMP`,
> year-month interval buckets and calendar-day components of day-time interval
> buckets align to the session time zone, matching `+ INTERVAL
> '<k*bucket_size>'` semantics across DST transitions.
> * Any NULL argument returns NULL.
> h2. Examples
> {code:sql}
> SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
> -- 2024-01-01 11:15:00
>
>
>
> SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');
>
>
> -- 2024-01-01 11:00:00
>
>
>
> SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');
>
>
> -- 2024-03-01 00:00:00
>
>
>
> SELECT time_bucket(INTERVAL '1' HOUR,
>
>
> TIMESTAMP '2024-01-01 11:27:00',
> TIMESTAMP '1970-01-01 00:05:00');
>
>
> -- 2024-01-01 11:05:00
>
>
> {code}
> h2. Errors
> * {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not
> foldable.
> * {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
> * {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have
> mismatched types.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]