laskoviymishka opened a new pull request, #789: URL: https://github.com/apache/iceberg-go/pull/789
Adds `Transaction.NewRowDelta()` — Go equivalent of Java's `BaseRowDelta`. Commits data files and delete files (position or equality) in one atomic snapshot. This is needed for row-level mutations: an UPDATE becomes an equality delete for the old row + append of the new row, both in one commit. Resolves #602. ## API ```go rd := tx.NewRowDelta(snapshotProps) rd.AddRows(dataFile1, dataFile2) rd.AddDeletes(posDeleteFile, eqDeleteFile) rd.Commit(ctx) ``` Operation type picked automatically: data-only → `append`, deletes-only → `delete`, both → `overwrite`. ## Validation - Delete files require format version >= 2 - Equality deletes must have non-empty `EqualityFieldIDs` referencing existing schema columns - Content types checked: no data files in `AddDeletes`, no delete files in `AddRows` ## Known limitations - No conflict detection for concurrent writers — documented in the type comment - Uses fast-append producer (no manifest merging) ## What's tested The interesting ones: - Commit data + position deletes, check snapshot summary has `added-data-files=1`, `added-delete-files=1`, operation is `overwrite` - Commit equality deletes, check `added-equality-delete-files` shows up in summary - Read back manifests after commit, verify there's one data manifest and one delete manifest with correct content types in entries - Two RowDeltas on same transaction (batch1 append, batch2 append+delete), verify cumulative `total-data-files` - v1 table rejects delete files with clear error - Equality delete file without field IDs → error - Equality delete file with field ID 999 (not in schema) → error The round-trip integration test: 1. Write 5 rows as real Parquet, append to table 2. Write a position delete file targeting positions 1 and 3, commit via RowDelta 3. Scan the table back — get 3 rows, verify IDs are `[1, 3, 5]` (beta and delta gone) This covers the full path: write parquet → RowDelta commit → scan with position delete filtering applied. ## What's left to do This PR covers the commit API. Remaining work for full DML support: - **Equality delete file writing** — a writer that produces Parquet files with PK-only schema and `EntryContentEqDeletes` content type. The RowDelta API already accepts them, but there's no convenient writer yet. - **Equality delete reading** — the scanner currently errors with "iceberg-go does not yet support equality deletes" (`scanner.go:415`). Needs: collect eq delete entries during scan planning, match to data files by partition + sequence number, apply hash-based anti-join during Arrow reads. - **Conflict validation** — `validateFromSnapshot`, `validateNoConflictingDataFiles`, etc. Java's Flink connector skips most of this for streaming, so it's not blocking for CDC use cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
