rahil-c opened a new issue, #17754: URL: https://github.com/apache/hudi/issues/17754
### Task Description **What needs to be done:** Investigate the need for this .copy() workaround in HoodieSparkLanceReader When running the following `testMultipleRegularInsertsWithCommitValidation`, noticed the following data correctedness issue where the actualDF seems to have had one row copied multiple times. <img width="714" height="649" alt="Image" src="https://github.com/user-attachments/assets/a2b605eb-1475-4a3c-842e-34d4d2afdead" /> Currently in order to workaround this issue I noticed that in the ``` @Override public ClosableIterator<HoodieRecord<InternalRow>> getRecordIterator(HoodieSchema schema) throws IOException { ClosableIterator<UnsafeRow> iterator = getUnsafeRowIterator(schema); //TODO .copy() is needed for correctness, return new CloseableMappingIterator<>(iterator, data -> unsafeCast(new HoodieSparkRecord(data))); } ``` when I applied the `data.copy()` I was able to get past this issue. <img width="690" height="650" alt="Image" src="https://github.com/user-attachments/assets/afae1dd5-81a9-4945-966d-5e523a140b09" /> ### Task Type Code improvement/refactoring ### Related Issues **Parent feature issue:** (if applicable ) **Related issues:** NOTE: Use `Relationships` button to add parent/blocking issues after issue is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
