Liu created FLINK-39030:
---------------------------
Summary: Support emit-only-on-update mode for CUMULATE window TVF
to reduce unnecessary outputs
Key: FLINK-39030
URL: https://issues.apache.org/jira/browse/FLINK-39030
Project: Flink
Issue Type: Improvement
Components: Table SQL / API
Reporter: Liu
h1. Summary
Currently, the CUMULATE window TVF (Table-Valued Function) triggers output at
every step interval, regardless of whether new data arrived during that period.
This behavior leads to unnecessary repeated outputs and increased downstream
processing overhead in scenarios where data arrives sparsely.
This proposal introduces an optional emit-only-on-update mode for CUMULATE
windows that only emits results when new data actually arrives within a step
interval.
h1. Motivation
Consider a CUMULATE window with maxSize=15 seconds and step=5 seconds, starting
at 00:00:00:
||Window||Time range||Triggered at||
|1|[00:00:00, 00:00:05)|00:00:05|
|2|[00:00:00, 00:00:10)|00:00:10|
|3|[00:00:00, 00:00:15)|00:00:15|
*Problem:* If data only arrives at 00:00:02, the current implementation still
emits three outputs (at 00:00:05, 00:00:10, and 00:00:15), even though windows
2 and 3 contain no new data compared to window 1.
*Impact:*
* Unnecessary computational overhead
* Increased network I/O for downstream operators
* Higher storage costs when persisting results
* Significant performance degradation in scenarios with sparse data
distribution
h1. Proposed Solution
Add an optional 6th parameter emit_on_update_only (BOOLEAN, default false) to
the CUMULATE TVF:
{code:java}
CUMULATE(
TABLE data_table,
DESCRIPTOR(rowtime),
INTERVAL '5' SECOND, -- step
INTERVAL '15' SECOND, -- maxSize
INTERVAL '0' SECOND, -- offset
true -- emit_on_update_only (NEW)
) {code}
Behavior when emit_on_update_only = true:
* Window output is only triggered when new data arrives within the current
step interval
* If no new data arrives, the window silently advances without emitting results
* This is particularly useful for sparse data streams where most step
intervals have no data
Behavior when emit_on_update_only = false (default):
* Current behavior is preserved (output at every step interval)
* Ensures backward compatibility
--
This message was sent by Atlassian Jira
(v8.20.10#820010)