Hi all, Aggregation computation is one of the most common requirements in real-time data processing, widely used in scenarios like real-time OLAP, reporting, and monitoring. However, the current approach of performing aggregation at the Flink compute layer faces significant state management challenges that limit system scalability and performance.
As the number of unique keys grows, Flink's aggregation state expands linearly, leading to several critical issues: - State bloat consuming massive memory and disk resources - Long checkpoint durations affecting RTO and causing backpressure - Expensive state redistribution making scaling difficult - Limited data scale a single job can handle By pushing aggregation down to the storage engine layer, we can fundamentally address these issues. Aggregation state would migrate from Flink State Backend to Fluss storage, making Flink jobs nearly stateless. This approach leverages Fluss's LSM structure for efficient storage, enables low-latency primary key queries on aggregation results, and significantly reduces resource consumption. Additionally, Apache Paimon has already implemented a comprehensive Aggregation Merge Engine. As Flink's streaming storage layer collaborating with Paimon in the stream-lake integration architecture, Fluss needs to align this core capability to ensure users can seamlessly switch between stream tables and lake tables. To implement this, I'd like to propose FIP-21: Aggregation Merge Engine [1]. <https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine> This proposal introduces: - An extensible aggregation framework supporting 11 common functions (sum, max, min, product, listagg, bool_and, bool_or, etc.) with field-level configuration - Exactly-once semantics through changelog-based undo recovery mechanism - Column lock coordination enabling multi-job concurrent writes without conflicts Any feedback and suggestions are welcome! [1]: https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine Best regards, Yang
