PhantomHunt opened a new issue, #8678:
URL: https://github.com/apache/hudi/issues/8678

   **Describe the problem you faced**
   
   We have 20 HUDI MOR tables with this configuration - 
   ```
   hudi_config = {"fs.s3a.impl""fs.s3a.impl" 
        'hoodie.datasource.write.operation': 'upsert',  
        'hoodie.datasource.write.precombine.field': 'cdc_timestamp',
        'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
        'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
        'hoodie.schema.on.read.enable' : "true",
        'hoodie.datasource.write.reconcile.schema' : "true",
        'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
        'hoodie.table.name': table_name,
        'hoodie.datasource.write.recordkey.field': id,
        'hoodie.datasource.write.table.name': table_name,
        'hoodie.upsert.shuffle.parallelism': 200,
        'hoodie.keep.max.commits': 50,
        'hoodie.keep.min.commits': 40,
        'hoodie.cleaner.commits.retained': 30
   }
   ```
   Our code is working fine but recently we noticed that the S3 bucket of one 
of our 20 tables has 999+ objects (lots of parquets and logs files dating since 
28 March 2023). It seems that compaction and cleaner are not working properly 
for this table screenshot of the S3 structure - 
![image](https://github.com/apache/hudi/assets/36084173/7e6893fc-f7be-4b3c-8f21-89c76baba4f9).
 Also we can see compaction inflight but not completed.
   
   **Expected behavior**
   
   Hudi must perform compaction and clean old data beyond the set values in 
configuration.
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.3.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   **Additional context**
   
   We tried to see the configuration via hudi cli but faced this[ issue 
8676](https://github.com/apache/hudi/issues/8676)
   
   We modified the compaction and logcompaction configuration but faced this 
[issue 8677](https://github.com/apache/hudi/issues/8677)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to