SabyasachiDasTR opened a new issue, #7600: URL: https://github.com/apache/hudi/issues/7600
**Describe the problem you faced** We are incrementally upserting data into our Hudi table/s every 5 minutes. We have set CLEANER_POLICY as KEEP_LATEST_BY_HOURS with CLEANER_HOURS_RETAINED = 48. The old delta log files in our partition from 2 months back are still not cleaned and we can see in cli last cleanup happened 2 months back on November. I do not see any action being performed on cleaning the old log files. The only command we execute is Upsert and we have single writer and compaction runs every hour. We think this is causing out emr job to underperform and crash multiple times as very large number of delta log files are getting piled up in the partitions and compaction is trying to read them while processing the job. ![MicrosoftTeams-image (33)](https://user-images.githubusercontent.com/52735405/210500715-89227935-b74a-418a-9701-5b783c56a74e.png) **Options used during Upsert:** ![HudiOptionsLatest](https://user-images.githubusercontent.com/52735405/210503366-77d47c7c-169f-4a87-8234-0971079a9347.PNG) **Writing to s3** ![Upsertcmd](https://user-images.githubusercontent.com/52735405/210501558-28eb3712-fed8-4c93-9c85-ccb6ef3521dc.PNG) Partition structure: s3://bucket/table/partition/parquet and .log files **Expected behavior** As per my understanding the logs should be deleted beyond CLEANER_HOURS_RETAINED which is 2 days . **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.2.1 * Hive version : Hive not install on EMR Cluster emr-6.7.0 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : s3 * Running on Docker? (yes/no) : No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org