zheliu2 opened a new pull request, #15570:
URL: https://github.com/apache/iceberg/pull/15570

   ## Summary
   
   Fixes #14851
   
   When querying a changelog view with `net_changes => true`, rows carried over 
by copy-on-write operations (e.g., MERGE INTO) show incorrect ordinals. A row 
inserted in snapshot 0 and carried over in snapshot 1 would incorrectly show 
ordinal 1 instead of 0.
   
   ### Root Cause
   
   In `CreateChangelogViewProcedure`, the sort spec for net changes processing 
sorts `_change_type` in ascending order within each `_change_ordinal`. This 
puts DELETE before INSERT within the same ordinal. When 
`RemoveNetCarryoverIterator` processes rows sequentially, it greedily pairs a 
cross-ordinal INSERT (from an earlier snapshot) with a same-ordinal DELETE 
(carry-over), instead of first cancelling the same-ordinal DELETE/INSERT 
carry-over pair.
   
   Example with a row carried over from ordinal 0 to ordinal 1:
   \`\`\`
   Row(data, INSERT, ordinal=0)   -- original insert
   Row(data, DELETE, ordinal=1)   -- carry-over delete (sorted first within 
ordinal 1)
   Row(data, INSERT, ordinal=1)   -- carry-over insert
   \`\`\`
   The iterator pairs INSERT@0 with DELETE@1, cancelling both, leaving only 
INSERT@1 (wrong).
   
   ### Fix
   
   Change the sort order for \`_change_type\` to **descending** when computing 
net changes. This puts INSERT before DELETE within the same ordinal:
   \`\`\`
   Row(data, INSERT, ordinal=0)   -- original insert
   Row(data, INSERT, ordinal=1)   -- carry-over insert (now sorted first)
   Row(data, DELETE, ordinal=1)   -- carry-over delete
   \`\`\`
   The iterator now sees INSERT@0, then INSERT@1 (same type, count=2), then 
DELETE@1 (opposite, count=1). Result: INSERT@0 (correct).
   
   ### Testing
   
   - Added \`testNetChangesWithInitialMergeInto\`: creates an unpartitioned 
table, writes initial data with MERGE INTO, performs a second MERGE INTO that 
triggers COW carry-overs, and verifies that carried-over rows retain their 
original ordinal.
   - Updated \`testNetChangesWithRemoveCarryOvers\` to reflect the corrected 
ordinal for carried-over rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to