[I] The best way to force cleaning hoodie metadata [hudi]

via GitHub Sun, 30 Nov 2025 01:38:35 -0800


hudi-bot opened a new issue, #16378:
URL: https://github.com/apache/hudi/issues/16378


   We have spark structured streaming job writing data to hudi tables. After an 
upgrade to hudi 0.11, we found that we have thousands of files under hoodie 
metadata which were not cleaned or archived. This impacts the overall 
processing of the streaming job. I found similar issue in 
[https://github.com/apache/hudi/issues/7472] and they mentioned this was fixed 
in 0.13. Since we have the issue in Prod, we will not be able to upgrade to 
0.13 for now. I found that I can run sperate spark submit job to execute 
HoodieCleaner. I also found that deleting hudie metadata from hudi-cli could be 
an option but I am not sure if its safe to use that approach as we are using 
upsert hudi operation in the streaming job. 
   Please advise what is the best way to force cleaning and archiving the 
metadata files.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-7332
   - Type: Bug
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] The best way to force cleaning hoodie metadata [hudi]

Reply via email to