zheliu2 opened a new pull request, #15570: URL: https://github.com/apache/iceberg/pull/15570
## Summary Fixes #14851 When querying a changelog view with `net_changes => true`, rows carried over by copy-on-write operations (e.g., MERGE INTO) show incorrect ordinals. A row inserted in snapshot 0 and carried over in snapshot 1 would incorrectly show ordinal 1 instead of 0. ### Root Cause In `CreateChangelogViewProcedure`, the sort spec for net changes processing sorts `_change_type` in ascending order within each `_change_ordinal`. This puts DELETE before INSERT within the same ordinal. When `RemoveNetCarryoverIterator` processes rows sequentially, it greedily pairs a cross-ordinal INSERT (from an earlier snapshot) with a same-ordinal DELETE (carry-over), instead of first cancelling the same-ordinal DELETE/INSERT carry-over pair. Example with a row carried over from ordinal 0 to ordinal 1: \`\`\` Row(data, INSERT, ordinal=0) -- original insert Row(data, DELETE, ordinal=1) -- carry-over delete (sorted first within ordinal 1) Row(data, INSERT, ordinal=1) -- carry-over insert \`\`\` The iterator pairs INSERT@0 with DELETE@1, cancelling both, leaving only INSERT@1 (wrong). ### Fix Change the sort order for \`_change_type\` to **descending** when computing net changes. This puts INSERT before DELETE within the same ordinal: \`\`\` Row(data, INSERT, ordinal=0) -- original insert Row(data, INSERT, ordinal=1) -- carry-over insert (now sorted first) Row(data, DELETE, ordinal=1) -- carry-over delete \`\`\` The iterator now sees INSERT@0, then INSERT@1 (same type, count=2), then DELETE@1 (opposite, count=1). Result: INSERT@0 (correct). ### Testing - Added \`testNetChangesWithInitialMergeInto\`: creates an unpartitioned table, writes initial data with MERGE INTO, performs a second MERGE INTO that triggers COW carry-overs, and verifies that carried-over rows retain their original ordinal. - Updated \`testNetChangesWithRemoveCarryOvers\` to reflect the corrected ordinal for carried-over rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
