[GitHub] [hudi] Prasagnya opened a new issue, #9493: Hudi DELETE_PARTITIONS operation doesn't delete partitions via Spark Data Frame

via GitHub Mon, 21 Aug 2023 07:07:26 -0700


Prasagnya opened a new issue, #9493:
URL: https://github.com/apache/hudi/issues/9493


   I am trying to delete partitions by issuing a save command on an empty Spark 
Data Frame. I expect Hudi to modify both metadata, as well as delete the actual 
parquet files in the destination root folder (based on the partition paths)
   
   Steps to reproduce the behavior:
   
   I use this code:
   
   ti_data.write.format("hudi") \
   .option("hoodie.datasource.write.operation", "delete_partition") \
   .option("hoodie.datasource.write.partitions.to.delete.key", 
"gs://region/partition=1/*") \
   .option("hoodie.datasource.write.table.type", "COPY_ON_WRITE") \
   
.option("hoodie.datasource.write.payload.class","org.apache.hudi.common.model.EmptyHoodieRecordPayload")
 \
   .option("hoodie.datasource.write.partitionpath.field", 
"NODE_TYPE,NODE,ITEM") \
   .option("hoodie.datasource.write.hive_style_partitioning", "true") \
   .mode("append")  \
   .save("gs://region/")
   
   
   In my example destUrl is : gs://region and that's where the hoodie table 
metadata is (.hoodie)
   Expected behavior
   
   After i run the above code with the above example partitions to delete, i 
expect the folder partition=1 gets deleted. I also expect the Hudi metadata in 
the root folder, that is, s3://data/region=TEST to be also modified to reflect 
this operation.
   
   Environment Description
   
   Hudi version : 0.10.1
   
   Actual result:
   After I run above and do show statement on gs://region/ its showing me data 
set
   
   https://github.com/apache/hudi/issues/6866#issuecomment-1283462233
   As per this, the delete is lazy, but how much time would cleaner take up 
after we trigger?
   
   @nsivabalan Please help with this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Prasagnya opened a new issue, #9493: Hudi DELETE_PARTITIONS operation doesn't delete partitions via Spark Data Frame

Reply via email to