[GitHub] [hudi] yihua commented on pull request #9593: [HUDI-6784] Support deletion logic in merger

via GitHub Fri, 22 Sep 2023 12:00:24 -0700


yihua commented on PR #9593:
URL: https://github.com/apache/hudi/pull/9593#issuecomment-1731908929


   To be fully compatible with before, the places where `HoodieRecordPayload` 
APIs (`#preCombine`, `#combineAndGetUpdateValue`, and `#getInsertValue`) are 
called should call merge API too.
   
   To make sure the handling of inserts, updates, and deletes are correct, here 
are a few scenarios you should test end-to-end with spark datasource read:
   - Base file only
   - Base file + one log file with updates (Avro type for default record and 
merger, and parquet type for Spark record and merger)
   - Base file + one log file with deletes (Avro type for default record and 
merger, and parquet type for Spark record and merger)
   - Base file + two log files with updates on the same record key, different 
precombine field ordering (Avro type for default record and merger, and parquet 
type for Spark record and merger, merging logic: overwrite with latest, 
event-time based ordering)
   - Base file + two log file with deletes on the same record key, different 
precombine field ordering (Default record and merger, Spark record and merger, 
merging logic: overwrite with latest, event-time based ordering)
   - Base file + one log file with updates + one log file with deletes + one 
log file with updates, on the same record key, different precombine field 
ordering (Avro type for default record and merger, and parquet type for Spark 
record and merger, merging logic: overwrite with latest, event-time based 
ordering)
   
   The goal is to make sure all complex logic around deletes and updates is 
handled correctly now.  Note that some of these scenarios may not be covered by 
unit/functional tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on pull request #9593: [HUDI-6784] Support deletion logic in merger

Reply via email to