Re: [DISCUSS] FIP-21: Aggregation Merge Engine

Yang Wang Sun, 07 Dec 2025 23:30:17 -0800

Hi Cheng,
Thanks for the thoughtful feedback and for bringing up the RocksDB 2PC
approach.
You've identified the core challenge precisely: data visibility vs.
real-time processing. This is exactly why we chose the Undo Recovery
mechanism over transaction-based approaches in the proposal.

*Key Considerations:1. Real-time Visibility Conflict*As you mentioned,
RocksDB 2PC would require delayed visibility until transaction commit. For
Fluss's positioning as a real-time streaming storage, this conflicts with
our fundamental requirement that writes should be immediately queryable. In
typical scenarios (e.g., real-time dashboards), users expect second-level
updates, not waiting for Flink checkpoint completion (which could be tens
of seconds).

*2. Already Evaluated and Rejected*We actually evaluated transaction
mechanisms in the FIP design phase. From the "Rejected Alternatives"
section:
> Use Transaction Mechanism to Implement Exactly-Once
>
> Disadvantages: Extremely high implementation complexity, requires
refactoring Fluss's write path, high performance overhead (requires delayed
visibility, increased commit overhead), conflicts with Fluss's real-time
visibility design philosophy
>
> Rejection Reason: Cost too high, inconsistent with Fluss's real-time
streaming storage positioning
(See FIP Section: "Rejected Alternatives")

*3. Additional Complexity with RocksDB 2PC*Beyond visibility issues:

   - Distributed coordination: Requires a global transaction coordinator
   across multiple TabletServers
   - Flink checkpoint alignment: How to coordinate RocksDB commit with
   asynchronous Flink checkpoints?
   - Multi-job concurrency: Column-level partial updates would require
   complex transaction isolation coordination
   - Performance overhead: Prepare/commit overhead exists for every write,
   even in normal cases

*4. Why Undo Recovery Fits Better*Our approach optimizes for the common
case:

   - Normal writes: Zero transaction overhead, immediate visibility
   - Failover (rare): Pay the cost of undo operations only when needed
   - Lightweight: Leverages existing Changelog capability, no global
   coordinator needed
   - Localized: Each bucket handles recovery independently via offset
   comparison

*Summary*
While RocksDB 2PC is theoretically cleaner from a database perspective, it
introduces unacceptable trade-offs for Fluss's real-time streaming use
cases. The Undo Recovery approach better aligns with our "optimize for the
common path" philosophy and maintains Fluss's real-time characteristics.
Would love to discuss further if you have additional thoughts!

Best regards,
Yang

Re: [DISCUSS] FIP-21: Aggregation Merge Engine

Reply via email to