Prasagnya opened a new issue, #9493: URL: https://github.com/apache/hudi/issues/9493
I am trying to delete partitions by issuing a save command on an empty Spark Data Frame. I expect Hudi to modify both metadata, as well as delete the actual parquet files in the destination root folder (based on the partition paths) Steps to reproduce the behavior: I use this code: ti_data.write.format("hudi") \ .option("hoodie.datasource.write.operation", "delete_partition") \ .option("hoodie.datasource.write.partitions.to.delete.key", "gs://region/partition=1/*") \ .option("hoodie.datasource.write.table.type", "COPY_ON_WRITE") \ .option("hoodie.datasource.write.payload.class","org.apache.hudi.common.model.EmptyHoodieRecordPayload") \ .option("hoodie.datasource.write.partitionpath.field", "NODE_TYPE,NODE,ITEM") \ .option("hoodie.datasource.write.hive_style_partitioning", "true") \ .mode("append") \ .save("gs://region/") In my example destUrl is : gs://region and that's where the hoodie table metadata is (.hoodie) Expected behavior After i run the above code with the above example partitions to delete, i expect the folder partition=1 gets deleted. I also expect the Hudi metadata in the root folder, that is, s3://data/region=TEST to be also modified to reflect this operation. Environment Description Hudi version : 0.10.1 Actual result: After I run above and do show statement on gs://region/ its showing me data set https://github.com/apache/hudi/issues/6866#issuecomment-1283462233 As per this, the delete is lazy, but how much time would cleaner take up after we trigger? @nsivabalan Please help with this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org