[ 
https://issues.apache.org/jira/browse/SPARK-56594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56594:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add time_bucket scalar function for interval-based timestamp bucketing
> ----------------------------------------------------------------------
>
>                 Key: SPARK-56594
>                 URL: https://issues.apache.org/jira/browse/SPARK-56594
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Nikolina Vraneš
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Summary
> Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that 
> returns the start of the fixed-size time bucket containing {{{}ts{}}}, 
> aligned to a configurable origin point.
> h2. Syntax
> {code:sql}
>   time_bucket(bucketSize, ts [, origin])
>   {code}
> h2. Arguments
>  * {{{}bucketSize{}}}: an interval expression (day-time or year-month) 
> defining the bucket size. Must be positive and foldable.
>  * {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
>  * {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01 
> 00:00:00 UTC{}}}. Must be the same type as {{ts}} and must be foldable.
> h2. Return type
> Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
> h2. Behavior
>  * Divides the time axis into consecutive, non-overlapping intervals of the 
> specified size, anchored at {{{}origin{}}}.
>  * Returns the start of the half-open bucket {{[start, start + bucketSize)}} 
> containing {{{}ts{}}}.
>  * Day-time intervals produce fixed-size buckets (microsecond floor division).
>  * Year-month intervals produce calendar-aligned buckets (end-of-month 
> capping with step-back semantics).
>  * All computation is performed on UTC values; session timezone does not 
> affect bucketing. Users can pass a shifted `origin` to align buckets to a 
> custom offset (e.g. local midnight), or cast to TIMESTAMP_NTZ for local 
> wall-clock bucketing in a DST zone.
>  * Any NULL argument returns NULL.
> h2. Examples
> {code:sql}
>   SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
>   -- 2024-01-01 11:15:00                                                      
>                                                                               
>                                                       
>    
>   SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');     
>                                                                               
>                                                       
>   -- 2024-01-01 11:00:00                            
>                                                                               
>                                                                               
>                                                       
>   SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');    
>                                                                               
>                                                       
>   -- 2024-03-01 00:00:00
>                                                                               
>                                                                               
>                                                       
>   SELECT time_bucket(INTERVAL '1' HOUR,                                       
>                                                                               
>                                                       
>                     TIMESTAMP '2024-01-01 11:27:00',
>                     TIMESTAMP '1970-01-01 00:05:00');                         
>                                                                               
>                                                       
>   -- 2024-01-01 11:05:00                                                      
>                                                                               
>                                                       
>   {code}
> h2. Errors
>  * {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not 
> foldable.
>  * {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
>  * {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have 
> mismatched types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to