[ 
https://issues.apache.org/jira/browse/SPARK-56594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolina Vraneš updated SPARK-56594:
------------------------------------
    Description: 
h2. Summary

Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that 
returns the start of the fixed-size time bucket containing {{{}ts{}}}, aligned 
to a configurable origin point.
h2. Syntax
{code:sql}
  time_bucket(bucketSize, ts [, origin])
  {code}
h2. Arguments
 * {{{}bucketSize{}}}: an interval expression (day-time or year-month) defining 
the bucket size. Must be positive and foldable.
 * {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
 * {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01 
00:00:00{}}}. Must be the same type as {{ts}} and must be foldable.

h2. Return type

Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
h2. Behavior
 * Divides the time axis into consecutive, non-overlapping intervals of the 
specified size, anchored at {{{}origin{}}}.
 * Returns the start of the half-open bucket {{[start, start + bucketSize)}} 
containing {{{}ts{}}}.
 * For `TIMESTAMP_NTZ`, bucketing is performed in UTC. For `TIMESTAMP`, 
year-month interval buckets and calendar-day components of day-time interval 
buckets align to the session time zone, matching `+ INTERVAL '<k*bucket_size>'` 
semantics across DST transitions.

 * Any NULL argument returns NULL.

h2. Examples
{code:sql}
  SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
  -- 2024-01-01 11:15:00                                                        
                                                                                
                                                  
   
  SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');       
                                                                                
                                                  
  -- 2024-01-01 11:00:00                            
                                                                                
                                                                                
                                                  
  SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');      
                                                                                
                                                  
  -- 2024-03-01 00:00:00
                                                                                
                                                                                
                                                  
  SELECT time_bucket(INTERVAL '1' HOUR,                                         
                                                                                
                                                  
                    TIMESTAMP '2024-01-01 11:27:00',
                    TIMESTAMP '1970-01-01 00:05:00');                           
                                                                                
                                                  
  -- 2024-01-01 11:05:00                                                        
                                                                                
                                                  
  {code}
h2. Errors
 * {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not 
foldable.
 * {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
 * {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have mismatched 
types.

  was:
h2. Summary

Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that 
returns the start of the fixed-size time bucket containing {{{}ts{}}}, aligned 
to a configurable origin point.
h2. Syntax
{code:sql}
  time_bucket(bucketSize, ts [, origin])
  {code}
h2. Arguments
 * {{{}bucketSize{}}}: an interval expression (day-time or year-month) defining 
the bucket size. Must be positive and foldable.
 * {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
 * {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01 00:00:00 
UTC{}}}. Must be the same type as {{ts}} and must be foldable.

h2. Return type

Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
h2. Behavior
 * Divides the time axis into consecutive, non-overlapping intervals of the 
specified size, anchored at {{{}origin{}}}.
 * Returns the start of the half-open bucket {{[start, start + bucketSize)}} 
containing {{{}ts{}}}.
 * For `TIMESTAMP_NTZ`, bucketing is performed in UTC. For `TIMESTAMP`, 
year-month interval buckets and calendar-day components of day-time interval 
buckets align to the session time zone, matching `+ INTERVAL '<k*bucket_size>'` 
semantics across DST transitions.

 * Any NULL argument returns NULL.

h2. Examples
{code:sql}
  SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
  -- 2024-01-01 11:15:00                                                        
                                                                                
                                                  
   
  SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');       
                                                                                
                                                  
  -- 2024-01-01 11:00:00                            
                                                                                
                                                                                
                                                  
  SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');      
                                                                                
                                                  
  -- 2024-03-01 00:00:00
                                                                                
                                                                                
                                                  
  SELECT time_bucket(INTERVAL '1' HOUR,                                         
                                                                                
                                                  
                    TIMESTAMP '2024-01-01 11:27:00',
                    TIMESTAMP '1970-01-01 00:05:00');                           
                                                                                
                                                  
  -- 2024-01-01 11:05:00                                                        
                                                                                
                                                  
  {code}
h2. Errors
 * {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not 
foldable.
 * {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
 * {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have mismatched 
types.


> Add time_bucket scalar function for interval-based timestamp bucketing
> ----------------------------------------------------------------------
>
>                 Key: SPARK-56594
>                 URL: https://issues.apache.org/jira/browse/SPARK-56594
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Nikolina Vraneš
>            Assignee: Nikolina Vraneš
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> h2. Summary
> Add a new scalar function {{time_bucket(bucketSize, ts [, origin])}} that 
> returns the start of the fixed-size time bucket containing {{{}ts{}}}, 
> aligned to a configurable origin point.
> h2. Syntax
> {code:sql}
>   time_bucket(bucketSize, ts [, origin])
>   {code}
> h2. Arguments
>  * {{{}bucketSize{}}}: an interval expression (day-time or year-month) 
> defining the bucket size. Must be positive and foldable.
>  * {{{}ts{}}}: a TIMESTAMP or TIMESTAMP_NTZ expression to be bucketed.
>  * {{origin}} (optional): alignment anchor. Defaults to {{{}1970-01-01 
> 00:00:00{}}}. Must be the same type as {{ts}} and must be foldable.
> h2. Return type
> Same as {{ts}} (TIMESTAMP or TIMESTAMP_NTZ).
> h2. Behavior
>  * Divides the time axis into consecutive, non-overlapping intervals of the 
> specified size, anchored at {{{}origin{}}}.
>  * Returns the start of the half-open bucket {{[start, start + bucketSize)}} 
> containing {{{}ts{}}}.
>  * For `TIMESTAMP_NTZ`, bucketing is performed in UTC. For `TIMESTAMP`, 
> year-month interval buckets and calendar-day components of day-time interval 
> buckets align to the session time zone, matching `+ INTERVAL 
> '<k*bucket_size>'` semantics across DST transitions.
>  * Any NULL argument returns NULL.
> h2. Examples
> {code:sql}
>   SELECT time_bucket(INTERVAL '15' MINUTE, TIMESTAMP '2024-01-01 11:27:00');
>   -- 2024-01-01 11:15:00                                                      
>                                                                               
>                                                       
>    
>   SELECT time_bucket(INTERVAL '1' HOUR, TIMESTAMP '2024-01-01 11:27:00');     
>                                                                               
>                                                       
>   -- 2024-01-01 11:00:00                            
>                                                                               
>                                                                               
>                                                       
>   SELECT time_bucket(INTERVAL '1' MONTH, TIMESTAMP '2024-03-15 11:27:00');    
>                                                                               
>                                                       
>   -- 2024-03-01 00:00:00
>                                                                               
>                                                                               
>                                                       
>   SELECT time_bucket(INTERVAL '1' HOUR,                                       
>                                                                               
>                                                       
>                     TIMESTAMP '2024-01-01 11:27:00',
>                     TIMESTAMP '1970-01-01 00:05:00');                         
>                                                                               
>                                                       
>   -- 2024-01-01 11:05:00                                                      
>                                                                               
>                                                       
>   {code}
> h2. Errors
>  * {{DATATYPE_MISMATCH.NON_FOLDABLE_INPUT}} — bucketSize or origin is not 
> foldable.
>  * {{DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE}} — bucketSize is zero or negative.
>  * {{DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE}} — ts and origin have 
> mismatched types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to