BalaMahesh opened a new issue, #7733: URL: https://github.com/apache/hudi/issues/7733
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** We have Postgres data coming from debezium connector via Kafka. We are running Hudi in upsert mode on this dataset, we have seen that there are around 12 records which has two versions of data for the same id instead of updating the latest values and cleaning the old record. We are yet not clear how this is the case since this data is from older commits. **To Reproduce** Steps to reproduce the behavior: 1. Start Postgres Debezium Kafka connector and publish data to Kafka 2. Run Hudi in upsert mode 3. We are not sure whether there are any crashes happened during those commits. 4. Use below configurations.5. hoodie.compaction.payload.class=org.apache.hudi.common.model.debezium.PostgresDebeziumAvroPayload hoodie.table.type=MERGE_ON_READ hoodie.table.metadata.partitions= hoodie.table.precombine.field=_event_lsn hoodie.table.partition.fields= hoodie.archivelog.folder=archived hoodie.timeline.layout.version=1 hoodie.table.checksum=4134192528 hoodie.datasource.write.drop.partition.columns=false hoodie.table.recordkey.fields=id hoodie.partition.metafile.use.base.format=false hoodie.populate.meta.fields=true hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator hoodie.table.base.file.format=PARQUET hoodie.table.version=5 **Expected behavior** We expect only version of the record to be available in the latest queried data. **Environment Description** * Hudi version : 0.12.1 * Spark version : 3.2.1 * Hive version : 2.3.5 * Hadoop version : 2.7.7 * Storage (HDFS/S3/GCS..) : GCS * Running on Docker? (yes/no) : yes. **Additional context** hoodie.datasource.write.recordkey.field=id hoodie.datasource.write.partitionpath.field= hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator hoodie.cleaner.policy=KEEP_LATEST_COMMITS hoodie.clean.automatic=true hoodie.clean.async=true hoodie.cleaner.commits.retained=5 hoodie.keep.min.commits=10 #compaction config hoodie.datasource.compaction.async.enable=true hoodie.parquet.small.file.limit=104857600 hoodie.compaction.target.io=50 **Stacktrace** ``` Id updated _hoodie_commit_time. _event_lsn Aa5udG 1667998354 20221109125316627 5037873812216 Aa5udG 1667972649 20221109055102633 5028051185232 Aa61Gb 1667998400 20221109125802500 5037878072632 Aa61Gb 1667972837 20221109055102633 5028239838008 Aa7hZx 1667998411 20221109125802500 5037879344768 Aa7hZx 1667973014 20221109055102633 5028334998944 Aa81Sq 1667998439 20221109125802500 5037897355680 Aa81Sq 1667973345 20221109055825061 5028484902408 AbB9sW 1668051396 20221110034051271 5045161427664 AbB9sW 1667974610 20221109061740419 5029141615480 OiYzUz 1672662739 20230112125716390 6287523270024 OiYzUz 1672662739 20230112125716390 6287523270024 XxNzFk 1667982183 20221109082337760 5031758334520 XxNzFk 1667981380 20221109081024733 5031516715520 YxNzFk 1667982167 20221109082337760 5031758226096 YxNzFk 1667981376 20221109081024733 5031516565840 YbB9sW 1668051393 20221110034051271 5045160856976 YbB9sW 1667974609 20221109061740419 5029141513960 ZxNzFk 1667982174 20221109082337760 5031755205544 ZxNzFk 1667981375 20221109081024733 5031516243272 ZanXvJ 1668051273 20221110033657677 5045153106408 ZanXvJ 1667967621 20221109042439193 5025825527744 ZbB9sW 1668051391 20221110034051271 5045160222496 ZbB9sW 1667974609 20221109061740419 5029141376128 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org