Hi all, Change Data Capture (CDC) is essential for real-time data processing, widely used in scenarios like incremental ETL, data synchronization, and audit logging. However, Fluss currently lacks a standardized mechanism to expose changelog and binlog data through Flink SQL, forcing users to implement custom change tracking solutions.
Without native support, users face several challenges: - No SQL interface to query operation types (INSERT, UPDATE, DELETE) - No efficient access to change data for incremental processing - No consistent audit trails across applications - No visibility into before/after states for UPDATE operations To address this, I'd like to propose FIP-20: Introduce $changelog and $binlog Virtual Tables [1]. [1] https://cwiki.apache.org/confluence/display/FLUSS/FIP-20+Introduce+%24changelog+and+%24binlog+Virtual+Tables+in+Flink+Engine This proposal introduces: - *$changelog virtual table*: Flat schema with metadata columns ( _change_type, _log_offset, _commit_timestamp) exposing change operations (+I, -U, +U, -D for PK tables; +A for Log tables) - *$binlog virtual table*: Nested schema with before/after ROW columns providing Debezium-style CDC format with event-level semantics (I, U, D) - *Schema introspection*: Support for DESCRIBE and SHOW CREATE TABLE on virtual tables - *Broad table support*: $changelog works for both Primary Key tables and Log tables; $binlog for Primary Key tables only This follows the existing $lake virtual table pattern, ensuring consistency across Fluss's virtual table design. Any feedback and suggestions are welcome! Best regards, Mehul
