This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new eb6a998fd85 updated delete to mention duplicates- and did some writing cleanup (#10659) eb6a998fd85 is described below commit eb6a998fd85deaf7fef551f74ea70b0f08cffe22 Author: nadine farah <nfara...@gmail.com> AuthorDate: Fri Feb 23 16:16:27 2024 -0800 updated delete to mention duplicates- and did some writing cleanup (#10659) --- website/docs/write_operations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/write_operations.md b/website/docs/write_operations.md index 90b87499fe0..3146db05802 100644 --- a/website/docs/write_operations.md +++ b/website/docs/write_operations.md @@ -29,7 +29,7 @@ of initial load. However, this just does a best-effort job at sizing files vs gu Hudi supports implementing two types of deletes on data stored in Hudi tables, by enabling the user to specify a different record payload implementation. - **Soft Deletes** : Retain the record key and just null out the values for all the other fields. This can be achieved by ensuring the appropriate fields are nullable in the table schema and simply upserting the table after setting these fields to null. -- **Hard Deletes** : A stronger form of deletion is to physically remove any trace of the record from the table. This can be achieved in 3 different ways. +- **Hard Deletes** : This method entails completely eradicating all evidence of a record from the table, including any duplicates. There are three distinct approaches to accomplish this: - Using DataSource, set `OPERATION_OPT_KEY` to `DELETE_OPERATION_OPT_VAL`. This will remove all the records in the DataSet being submitted. - Using DataSource, set `PAYLOAD_CLASS_OPT_KEY` to `"org.apache.hudi.EmptyHoodieRecordPayload"`. This will remove all the records in the DataSet being submitted. - Using DataSource or Hudi Streamer, add a column named `_hoodie_is_deleted` to DataSet. The value of this column must be set to `true` for all the records to be deleted and either `false` or left null for any records which are to be upserted.