yihua commented on PR #9593: URL: https://github.com/apache/hudi/pull/9593#issuecomment-1731908929
To be fully compatible with before, the places where `HoodieRecordPayload` APIs (`#preCombine`, `#combineAndGetUpdateValue`, and `#getInsertValue`) are called should call merge API too. To make sure the handling of inserts, updates, and deletes are correct, here are a few scenarios you should test end-to-end with spark datasource read: - Base file only - Base file + one log file with updates (Avro type for default record and merger, and parquet type for Spark record and merger) - Base file + one log file with deletes (Avro type for default record and merger, and parquet type for Spark record and merger) - Base file + two log files with updates on the same record key, different precombine field ordering (Avro type for default record and merger, and parquet type for Spark record and merger, merging logic: overwrite with latest, event-time based ordering) - Base file + two log file with deletes on the same record key, different precombine field ordering (Default record and merger, Spark record and merger, merging logic: overwrite with latest, event-time based ordering) - Base file + one log file with updates + one log file with deletes + one log file with updates, on the same record key, different precombine field ordering (Avro type for default record and merger, and parquet type for Spark record and merger, merging logic: overwrite with latest, event-time based ordering) The goal is to make sure all complex logic around deletes and updates is handled correctly now. Note that some of these scenarios may not be covered by unit/functional tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org