[I] Support log scanning for PrimaryKey tables with ARROW format [fluss-rust]

via GitHub Sun, 01 Mar 2026 05:09:18 -0800


J0hnG4lt opened a new issue, #405:
URL: https://github.com/apache/fluss-rust/issues/405


   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/fluss-rust/issues) and found nothing similar.
   
   ### Description
   
   The Rust client currently rejects log scanning for PrimaryKey tables 
entirely, returning an `UnsupportedOperation` error in `scanner.rs`. However, 
the Java client supports this when the table's log format is ARROW — the wire 
format includes a per-record `ChangeType` byte array (Insert, UpdateBefore, 
UpdateAfter, Delete) before the Arrow IPC data in non-append-only batches.
   
   **Motivation / use case:**
   
   I'm an independent contributor (not affiliated with either project) working 
on adding an Apache Fluss data connector to 
[SpiceAI](https://github.com/spiceai/spiceai) — a portable accelerated SQL 
query engine written in Rust. SpiceAI already supports CDC streaming from 
sources like DynamoDB Streams and Debezium/Kafka, and I'd like to add Fluss as 
another CDC-capable data source. This requires the Rust client to support log 
scanning on PK tables so that CDC events (Insert, Update, Delete) can be 
streamed into SpiceAI's accelerated materialized views for real-time querying.
   
   This work has been done with the assistance of Claude (Anthropic's AI).
   
   I have a working implementation in my fork: 
[J0hnG4lt/fluss-rust#2](https://github.com/J0hnG4lt/fluss-rust/pull/2) 
(`feat/pk-table-arrow-cdc-v2` branch, single commit). I'd love to contribute 
this upstream and am very open to reviews, suggestions, and any changes needed 
to align with the project's direction.
   
   **Changes in the implementation:**
   
   1. **`arrow.rs`** — Parse `ChangeTypeVector` bytes for non-append-only 
`DefaultLogRecordBatch`. The `is_append_only()` flag is bit 0 of the batch 
attributes. When not append-only, the first `record_count` bytes after the 
fixed header are per-record `ChangeType` values, followed by the Arrow IPC 
payload.
   
   2. **`scanner.rs`** — Replace the blanket PK table rejection with a format 
check: only reject non-ARROW formats (INDEXED format scanning is genuinely 
unsupported). PK tables with ARROW format are allowed.
   
   3. **Tests** — Unit tests for ChangeType byte parsing (all 4 change types + 
error cases) and integration tests for PK table CDC (insert, update, delete 
change types) using the shared test cluster.
   
   **Reference:** Java implementation in 
`DefaultLogRecordBatch.columnRecordIterator()` and `LogScanner` (which does not 
restrict by table type).
   
   ### Willingness to contribute
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support log scanning for PrimaryKey tables with ARROW format [fluss-rust]

Reply via email to