[ 
https://issues.apache.org/jira/browse/SPARK-56820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56820:
-----------------------------------
    Labels: pull-request-available  (was: )

> counter_diff window function for converting cumulative counters to delta 
> format
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-56820
>                 URL: https://issues.apache.org/jira/browse/SPARK-56820
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Petar Nikic
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Summary
> Create a new window function, {{counter_diff}} for converting a cumulative 
> counter metric to delta format by computing the difference between 
> consecutive counter values within a time series.
> h2. Syntax
> {code:sql}
> {{counter_diff ( value [, start_time] ) OVER clause}}
> {code}
> h2. Arguments
>  * {*}value{*}: A cumulative counter. Must be numeric and non-negative.
>  * {*}start_time{*}: An optional timestamp parameter which indicates when the 
> counter was last set or reset to zero. It is used to signal counter resets.
>  * {*}OVER clause{*}: The clause describing the windowing.
>  ** {*}PARTITION BY{*}: used to separate independent counters. Good 
> partitioning columns would be the metric name, as well as any attributes tied 
> to the metric.
>  ** {*}ORDER BY{*}: used to order the observations by the associated 
> timestamp in ascending order.
> h2. Returns
> Returns the difference between the current counter value and the previous 
> counter value, according to the order defined by ORDER BY in the OVER clause. 
> The return type is the same as {{{}value{}}}.
> Counter resets are handled by returning {{{}NULL{}}}. Resets are detected in 
> two cases:
>  * If the previous {{value}} is greater than the current {{{}value{}}}.
>  * If {{start_time}} is provided, and the current time is greater than the 
> previous one.
> In addition, {{NULL}} is returned for the first row of the partition.
> Errors are raised in the following cases:
>  * the provided counter is negative or,
>  * the {{start_time}} moves backwards in time.
> h2. Examples
> *_Example with a simple counter_*
> {code:sql}
> SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
> FROM VALUES
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400)
>     AS tab (m, t, c)
> {code}
> {code:java}
> http_requests 2026-01-01 00:00:00     NULL (first row)
> http_requests 2026-01-01 00:01:00     100
> http_requests 2026-01-01 00:02:00     200
> {code}
> *_Example with a counter reset_*
> {code:sql}
> SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
> FROM VALUES
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’, 50)
>   AS tab (m, t, c)
> {code}
> {code:java}
> http_requests 2026-01-01 00:00:00     NULL
> http_requests 2026-01-01 00:01:00     100
> http_requests 2026-01-01 00:02:00     200
> http_requests 2026-01-01 00:04:00     NULL (counter: 400 -> 50)
> {code}
> *_Example with a counter reset with increased start_time_*
> {code:sql}
> SELECT m, t, counter_diff(c, st) OVER (PARTITION BY m ORDER BY t)
> FROM VALUES
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’,
>    TIMESTAMP ‘2026-01-01T00:00:00’, 100),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’,
>    TIMESTAMP ‘2026-01-01T00:00:00’, 200),
>   (‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’,
>    TIMESTAMP ‘2026-01-01T00:02:31’, 400)
>   AS tab (m, t, st, c)
> {code}
> {code:java}
> http_requests 2026-01-01 00:01:00     NULL
> http_requests 2026-01-01 00:02:00     100
> http_requests 2026-01-01 00:04:00     NULL (start_time: 00:00 -> 02:31)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to