[
https://issues.apache.org/jira/browse/SPARK-56820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56820:
-----------------------------------
Labels: pull-request-available (was: )
> counter_diff window function for converting cumulative counters to delta
> format
> -------------------------------------------------------------------------------
>
> Key: SPARK-56820
> URL: https://issues.apache.org/jira/browse/SPARK-56820
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Petar Nikic
> Priority: Major
> Labels: pull-request-available
>
> h2. Summary
> Create a new window function, {{counter_diff}} for converting a cumulative
> counter metric to delta format by computing the difference between
> consecutive counter values within a time series.
> h2. Syntax
> {code:sql}
> {{counter_diff ( value [, start_time] ) OVER clause}}
> {code}
> h2. Arguments
> * {*}value{*}: A cumulative counter. Must be numeric and non-negative.
> * {*}start_time{*}: An optional timestamp parameter which indicates when the
> counter was last set or reset to zero. It is used to signal counter resets.
> * {*}OVER clause{*}: The clause describing the windowing.
> ** {*}PARTITION BY{*}: used to separate independent counters. Good
> partitioning columns would be the metric name, as well as any attributes tied
> to the metric.
> ** {*}ORDER BY{*}: used to order the observations by the associated
> timestamp in ascending order.
> h2. Returns
> Returns the difference between the current counter value and the previous
> counter value, according to the order defined by ORDER BY in the OVER clause.
> The return type is the same as {{{}value{}}}.
> Counter resets are handled by returning {{{}NULL{}}}. Resets are detected in
> two cases:
> * If the previous {{value}} is greater than the current {{{}value{}}}.
> * If {{start_time}} is provided, and the current time is greater than the
> previous one.
> In addition, {{NULL}} is returned for the first row of the partition.
> Errors are raised in the following cases:
> * the provided counter is negative or,
> * the {{start_time}} moves backwards in time.
> h2. Examples
> *_Example with a simple counter_*
> {code:sql}
> SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
> FROM VALUES
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400)
> AS tab (m, t, c)
> {code}
> {code:java}
> http_requests 2026-01-01 00:00:00 NULL (first row)
> http_requests 2026-01-01 00:01:00 100
> http_requests 2026-01-01 00:02:00 200
> {code}
> *_Example with a counter reset_*
> {code:sql}
> SELECT m, t, counter_diff(c) OVER (PARTITION BY m ORDER BY t)
> FROM VALUES
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:00:00’, 100),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’, 200),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’, 400),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’, 50)
> AS tab (m, t, c)
> {code}
> {code:java}
> http_requests 2026-01-01 00:00:00 NULL
> http_requests 2026-01-01 00:01:00 100
> http_requests 2026-01-01 00:02:00 200
> http_requests 2026-01-01 00:04:00 NULL (counter: 400 -> 50)
> {code}
> *_Example with a counter reset with increased start_time_*
> {code:sql}
> SELECT m, t, counter_diff(c, st) OVER (PARTITION BY m ORDER BY t)
> FROM VALUES
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:01:00’,
> TIMESTAMP ‘2026-01-01T00:00:00’, 100),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:02:00’,
> TIMESTAMP ‘2026-01-01T00:00:00’, 200),
> (‘http_requests’, TIMESTAMP ‘2026-01-01T00:04:00’,
> TIMESTAMP ‘2026-01-01T00:02:31’, 400)
> AS tab (m, t, st, c)
> {code}
> {code:java}
> http_requests 2026-01-01 00:01:00 NULL
> http_requests 2026-01-01 00:02:00 100
> http_requests 2026-01-01 00:04:00 NULL (start_time: 00:00 -> 02:31)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]