rahil-c opened a new issue, #17754:
URL: https://github.com/apache/hudi/issues/17754

   ### Task Description
   
   **What needs to be done:**
   
   Investigate the need for this .copy() workaround in HoodieSparkLanceReader
   
   When running the following `testMultipleRegularInsertsWithCommitValidation`, 
noticed the following data correctedness issue where the actualDF seems to have 
had one row copied multiple times.
   <img width="714" height="649" alt="Image" 
src="https://github.com/user-attachments/assets/a2b605eb-1475-4a3c-842e-34d4d2afdead";
 />
   
   Currently in order to workaround this issue I noticed that in the 
   
   ```
   @Override
     public ClosableIterator<HoodieRecord<InternalRow>> 
getRecordIterator(HoodieSchema schema) throws IOException {
       ClosableIterator<UnsafeRow> iterator = getUnsafeRowIterator(schema);
       //TODO .copy() is needed for correctness,
       return new CloseableMappingIterator<>(iterator, data -> unsafeCast(new 
HoodieSparkRecord(data)));
     }
   ```
   when I applied the `data.copy()` I was able to get past this issue. 
   
   <img width="690" height="650" alt="Image" 
src="https://github.com/user-attachments/assets/afae1dd5-81a9-4945-966d-5e523a140b09";
 />
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to