Hi Cheng, Thanks for the thoughtful feedback and for bringing up the RocksDB 2PC approach. You've identified the core challenge precisely: data visibility vs. real-time processing. This is exactly why we chose the Undo Recovery mechanism over transaction-based approaches in the proposal.
*Key Considerations:1. Real-time Visibility Conflict*As you mentioned, RocksDB 2PC would require delayed visibility until transaction commit. For Fluss's positioning as a real-time streaming storage, this conflicts with our fundamental requirement that writes should be immediately queryable. In typical scenarios (e.g., real-time dashboards), users expect second-level updates, not waiting for Flink checkpoint completion (which could be tens of seconds). *2. Already Evaluated and Rejected*We actually evaluated transaction mechanisms in the FIP design phase. From the "Rejected Alternatives" section: > Use Transaction Mechanism to Implement Exactly-Once > > Disadvantages: Extremely high implementation complexity, requires refactoring Fluss's write path, high performance overhead (requires delayed visibility, increased commit overhead), conflicts with Fluss's real-time visibility design philosophy > > Rejection Reason: Cost too high, inconsistent with Fluss's real-time streaming storage positioning (See FIP Section: "Rejected Alternatives") *3. Additional Complexity with RocksDB 2PC*Beyond visibility issues: - Distributed coordination: Requires a global transaction coordinator across multiple TabletServers - Flink checkpoint alignment: How to coordinate RocksDB commit with asynchronous Flink checkpoints? - Multi-job concurrency: Column-level partial updates would require complex transaction isolation coordination - Performance overhead: Prepare/commit overhead exists for every write, even in normal cases *4. Why Undo Recovery Fits Better*Our approach optimizes for the common case: - Normal writes: Zero transaction overhead, immediate visibility - Failover (rare): Pay the cost of undo operations only when needed - Lightweight: Leverages existing Changelog capability, no global coordinator needed - Localized: Each bucket handles recovery independently via offset comparison *Summary* While RocksDB 2PC is theoretically cleaner from a database perspective, it introduces unacceptable trade-offs for Fluss's real-time streaming use cases. The Undo Recovery approach better aligns with our "optimize for the common path" philosophy and maintains Fluss's real-time characteristics. Would love to discuss further if you have additional thoughts! Best regards, Yang
