[GitHub] [hudi] maheshguptags opened a new issue, #7613: Not able to Delete record

GitBox Thu, 05 Jan 2023 23:00:59 -0800


maheshguptags opened a new issue, #7613:
URL: https://github.com/apache/hudi/issues/7613


   **Not able to delete by spark which is generated by Flink hudi job**
   
   I have been trying to delete record from hudi table using pyspark which is 
generated by flink hudi job. So when I am running the job using config 1 the 
delete job creates a timeline of `delta commits` but does not delete the 
records.
   
   Whereas when I am trying with `config2` it creates `rollback` then 
`deltacommit` in `.hoodie` folder and creates empty parquet file in partition 
bucket. I want to understand that why config2 triggers the `rollback` and 
creates the `empty parquet` file.
   
   `config1`
   
   hudi_options_write = {
       'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
       'hoodie.datasource.write.recordkey.field': 
'list_id,customer_id,client_id',
       'hoodie.table.name': tableName,
       'hoodie.datasource.write.partitionpath.field': 'client_id',
       'hoodie.datasource.write.operation':'delete',
       'hoodie.datasource.write.precombine.field': 'created_date'
   } 
   `config2`
   hudi_options_write = {
       'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
       'hoodie.datasource.write.recordkey.field': 
'list_id,customer_id,client_id',
       'hoodie.table.name': tableName,
       'hoodie.datasource.write.partitionpath.field': 'client_id',
       'hoodie.datasource.write.operation':'upsert',
       'hoodie.datasource.write.payload.class': 
'org.apache.hudi.common.model.EmptyHoodieRecordPayload',
       'hoodie.datasource.write.precombine.field': 'created_date'
   }
   I want to know why is delete operation not working properly. while with 
config1 I am to delete the record written by `spark hudi` job.
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Write some data in HUDI Table using `Flink hudi` job
   2. Try to read it using Pyspark
   3. Apply filter and try to delete the record using `config1` and `config2`
   
   **Expected behavior**
   
   I want to delete the record using `spark` that is generated by Flink job
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : Spark 3.3.0
   
   * Hive version : Hive 3.1.3
   
   * Hadoop version : Hadoop 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Image**
   
   
![image](https://user-images.githubusercontent.com/115445723/210947376-6f39de32-2d8b-4ff7-9a42-2eeeb287b6e6.png)
   
   **Stacktrace**
   
[stacktrace_rollback_delete.log](https://github.com/apache/hudi/files/10357976/stacktrace_rollback_delete.log)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] maheshguptags opened a new issue, #7613: Not able to Delete record

Reply via email to